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Preface 



All kind of information from distant celestial bodies comes to us in the form 
of electromagnetic radiation. In most cases the propagation of this radiation 
can be described, as a reasonable approximation, in terms of rays. This is 
true not only in the optical range but also in the radio range of the electro- 
magnetic spectrum. For this reason the laws of ray optics are of fundamental 
importance for astronomy, astrophysics, and cosmology. 

According to general relativity, light rays are the light-like geodesics of a 
Lorentzian metric by which the spacetime geometry is described. This, how- 
ever, is true only as long as the light rays are freely propagating under the 
only influence of the gravitational fleld which is coded in the spacetime ge- 
ometry. If a light ray is influenced, in addition, by an optical medium (e.g., 
by a plasma), then it will not follow a light-like geodesic of the spacetime 
metric. It is true that for electromagnetic radiation traveling through the 
universe usually the influence of a medium on the path of the ray and on 
the frequency is small. However, there are several cases in which this influ- 
ence is very well measurable, in particular in the radio range. For example, 
the deflection of radio rays in the gravitational field of the Sun is consider- 
ably influenced by the Solar corona. Moreover, current and planned Doppler 
experiments with microwaves in the Solar system reach an accuracy in the 
frequency of Alo/u) = 10“^® which makes it necessary to take the influence 
of the interplanetary medium into account. Finally, even in cases where the 
quantitative influence of the medium is negligibly small it is interesting to 
ask in which way the qualitative aspects of the theory are influenced by the 
medium. The latter remark applies, in particular, to the intriguing theory of 
gravitational lensing. 

Unfortunately, general-relativistic light propagation in media is not usu- 
ally treated in standard textbooks, and the more specialized literature is 
concentrated on particular types of media and on particular applications 
rather than on general methodology. In this sense a comprehensive review of 
general-relativistic ray optics in media would fill a gap in the literature. It is 
the purpose of this monograph to provide such a review. 

Actually, this monograph grew out of a more special idea. It was my orig- 
inal plan to write a review on variational principles for light rays in general 
relativistic media, and to give some applications to astronomy and astro- 
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physics, in particular to the theory of gravitational lensing. However, I soon 
realized the necessity of precisely formulating the mathematical theory of 
light rays in general before I could tackle the question of whether these light 
rays are characterized by a variational principle. The sections on variational 
principles and on applications are now at the end of Part II, in which a gen- 
eral mathematical framework for ray optics is set up. This is written in the 
language of symplectic geometry, thereby elucidating the well-known analogy 
between ray optics and the phase-space formulation of classical mechanics. 

Moreover, I found it desirable to also treat the question of how to derive 
ray optics as an approximation scheme from Maxwell’s equations. This is 
the topic of Part I which serves the purpose of physically motivating the 
fundamental definitions of Part II. In vacuo, the passage from Maxwell’s 
equations to ray optics is, of course, an elementary textbook matter and the 
generalization to isotropic and non-dispersive media is quite straightforward. 
However, for anisotropic and/or dispersive media this passage is more subtle. 
In Part I two types of media are discussed in detail, viz., an anisotropic one 
and a dispersive one, and the emphasis is on general methodology. 

I have organized the material in such a way that it should be possible to 
read Part II without having read Part I. This is not recommended, of course, 
but the reader might wish to do so. Both parts begin with an introductory 
section containing a brief guide to the literature and a statement of assump- 
tions and notations used throughout. Whenever the reader feels that a symbol 
needs explanation or that the underlying assumptions are not clearly stated, 
he or she should consult the introductory section of the respective part. Also, 
the index might be of help if problems of that kind occur. 

Large parts of this monograph present material which, in essence, is not 
new. However, I hope that the formulation chosen here might give some new 
insight. As to Part I, our discussion of the passage from Maxwell’s equations 
to ray optics includes several mathematical details which are difficult to find 
in the literature, although the general features are certainly known to experts. 
To mention just one example, it is certainly known to experts that in a linear 
but anisotropic medium on a general-relativistic spacetime the light rays are 
determined by two “optical Finsler metrics” ; to the best of my knowledge, 
however, a full proof of this fact is given here for the first time. As to Part II, 
the basic formalism is just the 170-year-old Hamiltonian optics, rewritten in 
modern mathematical terminology and adapted to the framework of general 
relativity. However, the presentation is based on some general mathematical 
definitions which have not been used before. This remark applies, in partic- 
ular, to Definition 5.1.1, which is the definition of what I call “ray-optical 
structures”. This definition formalizes the widely accepted idea that all of 
ray optics can be derived from a “dispersion relation” . (The term “ray sys- 
tem” is sometimes used by Vladimir Arnold and his collaborators in a similar 
though not quite identical sense.) It also applies, e.g., to Definition 5.4.1, on 
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“dilation-invariant” ray optical structures, which characterizes dispersion-free 
media in a geometric way. 

On the other hand, I want to direct the reader’s attention to the fact that 
this monograph contains some particular results which, as far as I know, have 
not been known before. These include, e.g.: 

• the general redshift formula for light rays in media on a general-relativistic 
spacetime in Sect. 6.2; 

• the results on light bundles in isotropic non-dispersive media on a general- 
relativistic spacetime in Sect. 6.4, in particular the generalized “reciprocity 
theorem” (Theorem 6.4.3); 

• Theorem 7.3.1, which can be viewed as a version of Fermat’s principle 
for light rays in (possibly anisotropic and dispersive) media on general- 
relativistic spacetimes; 

• Theorem 7.5.4, which generalizes the “Morse index theorem” of Rieman- 
nian geometry to the case of light rays in stationary media on stationary 
general-relativistic spacetimes. 

Some of the questions raised in this monograph remain unanswered, i.e., to 
some extent this is an interim report on work in progress. In particular, this 
remark applies to the following two special issues, (a) In Part I we are able to 
prove that for the linear medium treated in Chap. 2 ray optics is associated 
with approximate solutions of Maxwell’s equations, i.e., that ray optics gives 
a viable approximation scheme for electromagnetic radiation. Unfortunately, 
we are not able to prove a similar result for the plasma model of Chap. 3. 
This is a gap which should be filled in the future, (b) In Part II we are 
able to establish a Morse index theorem for light rays in stationary media. 
However, it is still an open question whether these results can be generalized 
to the non-stationary case in which, up to now, a Morse theory exists only 
for vacuum rays. With Fermat’s principle in the form of Theorem 7.3.1 we 
have a starting point for setting up a Morse theory for light rays in arbitrary 
(non-stationary) media. This is an interesting problem to be tackled in future 
work. 

This monograph in its present form is a slightly revised version of my Ha- 
bilitation thesis. I would like to use this opportunity to thank the members of 
the Habilitation Committee, Karl-Eberhard Hellwig, Erwin Sedlmayr, Bernd 
Wegner, John Deem, Friedrich Wilhelm Hehl, and Gernot Neugebauer, for 
their interest in this work and for several useful comments. In particular, I 
would like to thank Bernd Wegner for paving the way to having this text 
published with Springer Verlag. 

While working at this monograph I have profited from many discussions, 
in particular with my academic teacher Karl-Eberhard Hellwig and his collab- 
orators at the Technical University in Berlin, but also with other colleagues. 
Special thanks are due to Wolfgang Hasse and Marcus Kriele for collaborar 
tion on various aspects of light propagation in general relativity; to Wolfgang 
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Rindler for hospitality at the University of Texas at Dallas and for discus- 
sions on the fundamentals of general relativity; to John Beem for hospitality 
at the University of Missouri at Columbia and for discussions on Lorentzian 
geometry; to Gernot Neugebauer and his collaborators for hospitality at the 
University of Jena and for discussions on various aspects of general relativity; 
to Paolo Piccione, Fabio Giannoni, and Antonio Masiello for hospitality dur- 
ing several visits to Italy and to Brazil and for collaboration on Morse theory; 
and to Jurgen Ehlers and Arlie Betters for fruitful discussions on Fermat’s 
principle and gravitational lensing. Also, I have enjoyed discussions on this 
subject with students during seminars and classes in Berlin, Osnabriick, and 
Sao Paulo. 

Finally, I am grateful to the Deutsche Forschungsgemeinschaft for spon- 
soring this work with a Habilitation stipend, and to the Wigner Foundation, 
to the Deutscher Akademischer Austauschdienst, and to the Fundagao de 
Amparo a Pesquisa do Estado de Sao Paulo for financially supporting my 
visits to Dallas, Columbia, and Sao Paulo. 

Berlin, August 1999 



Volker Perlick 
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1. Introduction to Part I 



In Part I we recapitulate the general ideas of how to derive the laws of 
ray optics from Maxwell’s equations. We presuppose a general-relativistic 
spacetime as background, and we consider media which are general enough 
to elucidate all relevant features of the method. Chapter 2 treats the case of 
a linear (not necessarily isotropic) dielectric and permeable medium in full 
detail. Chapter 3 discusses dispersive media in general and a simple plasma 
model in particular. In this way the material presented in Part I serves two 
purposes. First, it motivates our mathematical frame-work for ray optics, to 
be set up in Part II below. Second, it provides us with physically important 
examples of ray optical structures to which we shall recur frequently. 



1.1 A brief guide to the literature 

In Part I we have to assume some familiarity on the reader’s side with 
Maxwell’s equations in matter on a general-relativistic spacetime. Whereas 
vacuum Maxwell’s equations are detailed in any textbook on general relativ- 
ity, the matter case is not usually treated in extenso. For general aspects of 
the phenomenology of electromagnetic media in general relativity we refer 
to Bressan [17] who gives many earlier references. The case of a linear (not 
necessarily isotropic) dielectric 

and permeable medium which is at the basis of Chap. 2 is briefly treated 
by Schmutzer [127], Chap. IV, following an original article by Marx [91]. 
The general-relativistic plasma model which is at the basis of Chap. 3 is 
systematically treated in two articles by Breuer and Ehlers [18] [19]; for earlier 
work on the same subject we refer to Madore [90], to Bicak and Hadrava [14], 
and to Anile and Pantano [5] [6]. 

As an aside it should be mentioned that the phenomenological theory 
of electromagnetic media can be derived from electron theory by statisti- 
cal methods, i.e., that the macroscopic (phenomenological) Maxwell equa- 
tions can be derived from a sort of microscopic Maxwell equations. For linear 
isotropic media in inertial motion on flat spacetime this is a standard text- 
book matter; the generalization t© accelerated media is due to Kaufmann [67]. 
For a general-relativistic plasma, the derivation of phenomenological proper- 
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1. Introduction to Part I 



ties from the kinetic theory of photons is discussed in the above-mentioned 
article by Bicak and Hadrava [14]. 

The main topic of Part I is the derivation of the laws of ray optics from 
Maxwell’s equations. The basic idea is to make an approximate-plane-wave 
ansatz for the electromagnetic field and to assume that this ansatz satis- 
fies Maxwell’s equations in an asymptotic sense for high frequencies. This 
results in a dynamical law for wave surfaces which can be rewritten equiv- 
alently as a dynamical law for rays. In optics the dynamical law for wave 
surfaces is usually called the eikonal equation. It is formally analogous to the 
Hamilton- Jacobi equation of classical mechanics, whereas the dynamical law 
for rays is formally analogous to Hamilton’s equations. Mathematically, this 
so-called ray method is, of course, not restricted to Maxwell’s equations but 
applies equally well to other partial differential equations with or without 
relevance to physics. In this sense, the ray method has applications not only 
to optics but also to acoustics and to wave mechanics. In the latter context, 
the ray method is known as JWKB method, refering to the pioneering work 
of Jeffreys, Wentzel, Kramers and Brioullin, and is detailed in virtually any 
textbook on quantum mechanics. 

In this brief guide to the literature we shall concentrate on the ray method 
in optics. As to other applications we refer to the comprehensive list of refer- 
ences given in monographs such as Keller, Lewis and Seckler [70] or Jeffrey 
and Kawahara [66] . Purely mathematical aspects of the ray method can be 
found in textbooks on partial differential equations. Particularly useful for 
our purposes are, e.g., the books by Chazarain and Piriou [26] and by Egorov 
and Shubin [36]. 

Whereas rudiments of the ray method can be traced back to work of Li- 
ouville and Green around 1830, it was first carried through in the context 
of optics by Sommerfeld and Runge [132] in the year 1911, following a sug- 
gestion by Debye. The work of Sommerfeld and Runge was restricted to the 
vacuum Maxwell equations in an inertial system, and the only goal was to 
derive the corresponding eikonal equation. Their treatment was generalized 
and systematized by Luneburg [88] who considered infinite asymptotic series 
solutions rather than just asymptotic solutions of lowest order as Sommer- 
feld and Runge did. Later, the method was extended from the vacuum case 
to the case of light propagation in matter. This was a very active field of 
reasearch in the 1960s, see, e.g., Lewis [84], Chen [27] and Kravtsov [75]. All 
these papers are restricted to special relativity in the sense that they are 
presupposing a flat spacetime. Nonetheless, the techniques used are of inter- 
est also in view of general relativity. The reason is that vacuum Maxwell’s 
equations on a general-relativistic spacetime are very similar to Maxwell’s 
equations in an inhomogeneous medium on flat spacetime, at least locally. 
This was first observed by Plebanski [120]. Note, however, that global as- 
pects which do not carry over to general relativity are brought into play 
whenever temporal Fourier expansions (as e.g. by Lewis [84]) and/or spatial 
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Fourier expansions (as e.g. by Chen [27]) are used. A global treatment of the 
ray method that does carry over to general relativity is possible in terms of 
the Lagrangian manifold techniques introduced in the 1960s by Maslov and 
Arnold, see Arnold [7], Maslov [94], Duistermaat [31] or Guillemin and Stern- 
berg [55]. In Part I we are concerned with local questions only. However, we 
shall touch upon Lagrangian manifold techniques and their relevance for the 
investigation of caustics in Part II below. 

In general relativity, the passage from Maxwell’s equations to ray optics 
was carried through for the first time by Laue [77] in the year 1920. In this 
paper, which is the written version of a talk given by Laue at the 86. Natur- 
forscherversammlung, the author demonstrated how to derive from vacuum 
Maxwell’s equations on a curved spacetime the light-like geodesic equation for 
the rays. Laue’s treatment followed closely the seminal paper by Sommerfeld 
and Runge [132]. A more systematic general-relativistic treatment of the ray 
method in optics, including asymptotic solutions of arbitrarily high order, was 
brought forward much later by Ehlers [38]. He considered linear isotropic non- 
dispersive media on an arbitrary general-relativistic spacetime and derived 
not only the eikonal equation for the rays but also transport equations of arbi- 
trary order for the polarization plane along the rays. In particular, his results 
put earlier findings about light propagation in such media by Gordon [50] 
and Pham Mau Quan [117] on a mathematically firm basis. At least for the 
vacuum case, the main results can now be found in many textbooks on gen- 
eral relativity, see, e.g., Misner, Thorne and Wheeler [98], Straumann [136], 
or Stephani [133]. The general-relativistic relevance of higher order terms in 
the asymptotic series expansion was discussed by Dwivedi and Kantowski 
[32] and by Anile [4]. A general-relativistic treatment of the ray method for 
dispersive media, exemplified with a special plasma model, is due to Breuer 
and Ehlers [18] [19] who modified and enhanced earlier work by Madore [90], 
by Bicak and Hadrava [14], and by Anile and Pantano [5] [6]. 



1.2 Assumptions and notations 

We assume a general-relativistic spacetime, i.e., a four-dimensional C°° man- 
ifold with a metric of Lorentzian signature (+, +,+,—)• On this spacetime 
background we consider Maxwell’s equations, using units making the dielec- 
tricity and permeability constants of vacuum equal to one, Sq = Po — ^ • 
Thereby, in particular, the vacuum velocity of light is set equal to one. We 
restrict ourselves to the C°° category in the sense that throughout Part I all 
maps and tensor fields are tacitly assumed to be infinitely often differentiable. 
We work in local coordinates using standard index notation. Throughout, 
Einstein’s summation convention is in force with latin indices running from 1 
to 4 and with greek indices running from 1 to 3. The (covariant) components 
of the spacetime metric will be denoted by gab- As usual, we define g^^ by 
gab 9^"^ = where denotes the Kronecker delta, and we use gab (and 
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1. Introduction to Part I 



respectively) to lower (and raise, respectively) indices. With respect to a co- 
ordinate system x = partial derivatives ^ will be denoted 

by da for short, whereas Va means covariant derivative with respect to the 
Levi-Civita connection of our metric. For the sake of brevity, we shall speak 
of “a tensor field tensor field whose contravariant com- 
ponents in a coordinate system are etc. Our treatment will be purely 

local throughout Part I. Therefore, the use of local coordinates and index 
notation is no restriction whatsoever. 




2. Light propagation in linear dielectric 
and permeable media 



On our spacetime manifold we consider Maxwell’s equations in a linear but 
not necessarily isotropic medium, i. e., in a medium phenomenologically char- 
acterized by a dielectricity tensor field and a permeability tensor field. It is 
our goal to derive and to discuss the laws of ray optics in such a medium. 
The standard textbook problem of light propagation in vacuo is, of course, 
included as a special case. 

The results of this chapter cover a wide range of applications including 
light propagation in gases (isotropic case) and crystals (anisotropic case) 
as long as dispersion is ignored. For dispersive media we refer to Chap. 3 
below. In view of applications to astrophysics, the isotropic case is more 
interesting than the (much more complicated) anisotropic case. On the other 
hand, a thorough treatment of the anisotropic case is highly instructive from 
a methodological point of view. In particular, it gives us the opportunity to 
discuss the phenomenon of birefringence. 



2.1 Maxwell’s equations in linear dielectric 
and permeable media 

On our spacetime manifold, the source-free Maxwell equations for (macro- 
scopic) electromagnetic fields in matter can be written in local coordinates 
as 

= 0 and V'’Gt„ = 0, (2.1) 

or, using partial rather than covariant derivatives, as 

■rf^'‘diFcd = 0 and G^f) = Q . (2.2) 

In (2.1) and (2.2), rjabcd denotes the totally antisymmetric Levi-Civita tensor 
field (volume form) of our metric which is defined by the equation 

»71234 = ±\/ldet(i^cd)| • (2.3) 

Here the plus sign is valid if the coordinate system is right-handed and the 
minus sign is valid if it is left-handed. In other words, we have to choose an 
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2. Light propagation in linear dielectric and permeable media 



orientation on the domain of our coordinate system to fix the sign ambiguity 
of the Levi-Civita tensor field. However, this is irrelevant since Maxwell’s 
equations are invariant under rjabcd ' — —Vabcd, and so are all the relevant 
results in Part I. If the reader is not familiar with volume elements he or she 
may consult, e.g., Wald [146], p. 432. 

Fab and Gab denote the electromagnetic field strength and the electromag- 
netic excitation^ respectively, both of which are antisymmetric second rank 
tensor fields. With respect to a reference system, given in terms of a time-like 
vector field C/“ with t/“ I/a = — 1, we can introduce the electric field strength 

Ea=FabV'’ (2.4) 



and the magnetic field strength 

Ba^-lr)atcdU‘‘P=‘‘ (2.5) 



such that 



Fah = -ri'^ab Be Ui + EbVa- Ea % . ( 2 . 6 ) 



Here we have used the familiar property 

Voted v’^'^ = -St Si s'i - SI si St - s^ si St + 

sssist+sts'sl + sisist (2.7) 

of the Levi-Civita tensor field, cf., e.g., Wald [146], equation (B.2.12). 
Similarly, we introduce the electric excitation 

De=GaiU'‘ ( 2 . 8 ) 

and the magnetic excitation 

Ha = -^VotdU'’G'^'‘ (2.9) 



such that 



Get = -V'^abHcUd+D,,Ue-DaUb. ( 2 . 10 ) 

With respect to the reference system used for their definitions, the electric 
and magnetic field strengths are purely spatial one-forms, and so are the 
electric and magnetic excitations, 



EaU^ = BaU^ = 0 and DaU°- = HaU^ = 0 . (2.11) 

Our terminology of calling Ea and Ba the “field strengths” (in german: 
Feldstdrken) and Da and Ha the “excitations” (in german: Erregungen) fol- 
lows Gustav Mie and Arnold Sommerfeld. This terminology is reasonable 
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since Ea and Ba determine the Lorentz force exerted on a charged test parti- 
cle whereas, in the presence of field-producing charges and currents, Da and 
Ha are the fields “excited” by those sources via Maxwell’s equations. The 
traditional terminology of calling Ha the “magnetic field strength” is mis- 
leading. Moreover, it is highly inconvenient from a relativistic point of view 
where Ea and Ba, rather than Ea and Ha, are united into an antisymmetric 
second rank tensor field on spacetime. 

In what follows we consider Maxwell’s equations in the form (2.2). As 
long as only the metric is known, (2.2) gives us eight component equations for 
twelve unknown functions. (The unknown functions are the six independent 
components of the electromagnetic field strength plus the six independent 
components of the electromagnetic excitation.) Hence, (2.2) is an underde- 
termined system of partial differential equations. It must be supplemented 
by constitutive equations relating the electromagnetic field strength with the 
electromagnetic excitation. Thereby the medium is characterized in a phe- 
nomenological way. In this chapter we consider linear dielectric and permeable 
media according to the following definition. 

Definition 2.1.1. A linear dielectric and permeable medium is, by defini- 
tion, a medium characterized by constitutive equations that take the form 

Da = €a Eb and Ba = Ha^Hb , (2.12) 

in some reference system U^, with second rank tensor fields and sat- 
isfying the following conditions: 

(a) £a^ = 0 and U°‘ fia^ = 0 . 

(b) and . 

(c) ZaZb>0 and ZaZb>0 for all {Zi, Z 2 ,Zs, Z^) ^ (0, 0, 0, 0) with 
U^Za^O. 

We refer to the distinguished reference system as to the rest system, to 
£a^ as to the dielectricity tensor field and to Ha^ as to the permeability tensor 
field of the medium. 

Condition (a) of Definition 2.1.1 guarantees that the constitutive equa- 
tions (2.12) are in agreement with Do 17“ = 0 and Ba 17“ = 0. Conditions (b) 
and (c) imply that in the rest system of the medium the energy density 

u; = i(DoD“-hDoi7“) , (2.13) 

of the electromagnetic field is positive definite. Altogether, conditions (a), 
(b) and (c) guarantee that the dielectricity and permeability tensor fields are 
“spatially invertible”. We can, thus, define (/i“^)o^ by the properties 



(2.14) 
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The constitutive equations (2.12) can then be united in a single equation, 

Gab = (i n'^ab W Uj. + eb^ VI Ua - V> Ub)F„ . (2.15) 

The following special case deserves particular interest. 

Definition 2.1.2. A linear dielectric and permeable medium is called iso- 
tropic if the dielectricity and permeability tensor fields are of the special form 

ej- =s{S''a + UaU’‘) and flj' = l^{S'‘a + UaV’) , (2.16) 

with some scalar functions e and fi. {Condition (c) o/ Definition 2.1.1 then 
requires e and fi to be strictly positive.) 

In the isotropic case, (2.15) reduces to 

Gab = 1 (Fab + (1 - Sli){Fai Ub - Fu Ua)U'‘) . (2.17) 

In particular, vacuum can be characterized as a linear isotropic medium with 
£ = p; z= 1. In this case (and, more generally, in any isotropic medium with 
efjL = 1) U°‘ drops out from (2.17) and the constitutive equations take the 
form (2.12) in any reference system. This is in agreement with the obvious 
fact that for vacuum any reference system can be viewed as the rest system 
of the medium. 

We emphasize that our phenomenological constitutive equations are phys- 
ically reasonable in the rotational as well as in the irrotational case, i.e., 17“ 
need not be hypersurface-orthogonal. Although this should be clear from the 
general rules of relativity, there is still a debate on this issue, even in the case 
of an isotropic medium on flat spacetime, see, e.g., Pellegrini and Swift [106]. 

We are now going to analyze the dynamics of electromagnetic fields in 
a linear dielectric and permeable medium. We have already mentioned that 
Maxwell’s equations (2.2) alone give us eight equations for twelve unknown 
functions. With (2.12) at hand, and assuming that 17“ , and are known, 
we can eliminate six of the unknown functions. Now (2.2) gives us eight equa- 
tions for six functions, i.e., the system looks overdetermined. However, only 
six of those eight equations are evolution equations, governing the dynamics of 
electromagnetic fields, whereas the other two equations are constraints.This 
is most easily verified in a local coordinate system (a:^,a;^,x^,a;^) in which 
the hypersurfaces = const, are space-like such that x'^ can be viewed as 
a local time function. Owing to the antisymmetry of the Levi-Civita tensor, 
the a = 4 components of equations (2.2) do not involve any ^4 derivative. 
Hence, these two equations are to be viewed as constraints whereas the re- 
maining six equations, i.e., the a = 1,2,3 components of equations (2.2), 
are the evolution equations governing the dynamics. Again owing to the anti- 
symmetry of the Levi-Civita tensor field, the evolution equations preserve the 
constraints in the following sense. If the constraints are written in the form 
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Cl = 0 and C 2 = 0, then the evolution equations imply that 54 C 1 = fi C\ 
and 54 C 2 = / 2 C 2 with some spacetime functions f\ and / 2 . Hence, if a 
solution of the evolution equations satisfies the constraints on some initial 
hypersurface = const., then it satisfies the constraints everywhere (on 
some neighborhood of any point of the initial hypersurface, that is). In other 
words, locally around any one point all solutions of Maxwell’s equations can 
be found in the following way. 

Step 1. Choose a space-like hypersurface through that point. 

Step 2. Choose a local coordinate system such that the chosen hyper- 
surface is given by the equation = const. 

Step 3. Solve the evolution equations with all initial data that satisfy 
the constraints. 



In the rest of this section we shall prove that the initial value problem con- 
sidered in Step 3 is well-posed in the sense that it is characterized by a local 
existence and uniqueness theorem, provided that the initial hypersurface has 
been chosen appropriately. Conditions (a), (b) and (c) of Definition 2.1.1 will 
prove essential for this result. 

First we introduce special coordinates according to the following defini- 
tion. 



Definition 2.1.3. Let U°' denote the rest system of a linear dielectric and 
permeable medium and fix a spacetime point xq. Then a local coordinate sys- 
tem {x^ , x^ , x^ , x"^) , defined on a neighborhood of xq, is called adapted to I/® 
near Xq if 

(a) U°’ is given by the equation C/® = ^4 ; 

(b) = 0 /or /X = 1, 2, 3 at the point xq . 

For any linear dielectric and permeable medium, it is obvious that adapted 
coordinates are characterized by the following existence and uniqueness prop- 
erty. If we choose a spacetime point xq and a hypersurface S that is orthogonal 
to 17® at xo, then there is a coordinate system adapted to 17® near xq such 
that S is represented by the equation = const. Another coordinate system 
(x'^,x'^,x'^,x"^) is, again, adapted to 17® near xo if and only if it is related 
to (x^,x^,x^,x^) by a coordinate transformation of the special form 



x^ I — > x'^(x^,x^,x^) 
x^ I — > x'^(x^,x^,x^,x^) 



(2.18) 



%f\xo 

Condition (b) of Definition 2.1.3 makes sure that at the point xo the 
hypersurface x^ = const, intersects the respective integral curve of 17® or- 
thogonally. This implies that, on a sufficiently small neighborhood of xo, all 
hypersurfaces x^ = const, are space-like. Of course, they cannot be orthogo- 
nal to 17® on a whole neighborhood unless the medium is non-rotating. Hence, 
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in an adapted coordinate system the mixed components ^^^4 and of the 
metric need not vanish except at the central point xq. The spatial components 
give positive definite 3x3 matrices ( 9 ^ 1 /) and on some neighborhood 
of xq ; at the point xq these matrices are inverse to each other. The temporal 
components 544 and g^^ are strictly negative functions on some neighborhood 
of xo ; at the point xo they are inverse to each other. 

Now we consider a linear dielectric and permeable medium in a coordinate 
system adapted to its rest system near some point xq. Then (2.11) reduces 
to 



JB 4 = ^4 = 0 and D 4 = H 4 = 0 (2.19) 



owing to condition (a) of Definition 2.1.3. Hence, (2.12) simplifies to 

D^=e^PEp and = ( 2 . 20 ) 



Conditions (b) and (c) of Definition 2.1.1 guarantee that and 9 ,^^ are 
positive definite and symmetric with respect to We can, thus, define 
and which are again positive definite and symmetric with respect to 
by 



Va-'^ Vr^ ~ and Wr^ = . 



( 2 . 21 ) 



For the following it will be convenient to introduce the quantities 

Zp^Vp^B^ and Yp = Wp^Da, ( 2 . 22 ) 

and to use for the six independent components of the 

electromagnetic field. That is to say, we start from Maxwell’s equations (2.2) 
with (2.6) and (2.10); we use part (a) of Definition 2.1.3 and equation (2.19); 
we eliminate Er and Hr with the help of ( 2 . 20 ); finally, we express and 
Bfr in terms of Zp and Yp by means of ( 2 . 22 ). After a little bit of algebra, 
the a = 1,2,3 components of Maxwell’s equations (2.2) give us evolution 
equations of the form 



L^da 






for the dynamical variables 




and 




(2.23) 



(2.24) 



Here 1/^, and are x-dependent 6 x 6 matrices of the form 






Q 1 ; 



and = 



0 (AT^ 

0 ^ ’ 



(2.25) 
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where Q is a 3 x 3 matrix with components 

Qx^ - wx' = w\ gp4 ; (2.26) 

is a 3 X 3 matrix with components 

= (2.27) 

( • means transposition with respect to such that, e.g., 

= (2.28) 



iVf is a 6 X 6 matrix whose components involve the spacetime metric along 
with v\° and Wp'^. 

For the investigation of the evolution equations (2.23) the following two 
observations are crucial. 

(a) At the central point xq of our adapted coordinate system we have p/34 = 0 
and, thus, Q = 0. By continuity, is invertible on some neighborhood 
of xo to which we can restrict our considerations. Hence, (2.23) can be 
solved for the ^4 derivative. 

(b) and are symmetric ( = self-adjoint) with respect to the 
positive definite scalar product 

Here the dots on the right-hand side refer to the scalar product defined 
by 

a-b = gf^’'aUbp (2.30) 

for any two C^-valued functions a and b on the neighborhood considered, 
with the overbar denoting complex conjugation. (To be sure, in (2.23) 
all quantities are real. For later purposes, however, we need the complex 
version of this scalar product.) 

These two observations imply that (2.23) satisfies the defining properties of a 
symmetric hyperbolic system of partial differential equations. By a well-known 
theorem (see, e.g.. Theorem 4.5 in Cheizarain and Piriou [26] or Sect. 4.12 
in Egorov and Shubin [36]) this guarantees local existence and uniqueness 
of a solution Z, Y for any initial data Zq, Yq given on our hypersurface 
x^ = const. (Please recall our stipulation of tacitly working in the C°° cate- 
gory throughout Part I. Had we restricted ourselves to the analytic category 
instead, property (a) alone would guarantee local existence and uniqueness 
of a solution to any initial data, owing to the well-known Cauchy-Kovalevsky 
theorem.) Moreover, the fact that (2.23) is symmetric hyperbolic implies that 
solutions Z, Y are bounded in terms of so-called energy inequalities, see, e.g.. 
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Theorem 4.3 in Chazarain and Piriou [26] or Theorem 2.63 in Egorov and 
Shubin [36]. We are going to employ these facts later. 

The explicit form of the matrix M in (2.23) will be of no interest for 
us in the following. What really matters is the structure of the i.e., the 
information contained in the 6x6 matrix 

L{x,p) = Pa L°'{x) . (2.31) 

Here the first argument x = (x^,x^,a:^,rE'^) ranges over the coordinate neigh- 
borhood considered and the second argument p— (pi,P 2 >P 3 )P 4 ) ranges over 
The matrix L{x,p) defined by (2.31) is called the principal matrix ot the 
characteristic matrix of the system of differential equations (2.23). Its deter- 
minant, which gives a homogeneous polynomial of degree six in the Po> is 
called the principal determinant or the characteristic determinant of (2.23). 
We shall see later that the laws of ray optics in our medium are coded in the 
characteristic determinant. 

The notions of characteristic matrix and characteristic determinant can 
be introduced for any system of order partial differential equations, linear 
in the highest order derivatives, that gives n equations for n dynamical vari- 
ables. The characteristic matrix is then formed in a fashion similar to (2.31) 
from the coefficients of the highest order derivatives. If these coefficients are 
independent of the unknown functions, i.e., if the system of differential equa- 
tions is semi-linear, the characteristic matrix is of the form 

L{x,p)=Pa,‘-Pa,L^^'’‘^'‘{x). (2.32) 

Hence, its determinant gives a homogeneous polynomial of degree nk with 
respect to the Pa- 



2.2 Approximate-plane- wave families 

In the preceding section we have discussed Maxwell’s equations in a linear 
dielectric and permeable medium. The laws of light propagation in such a 
medium are determined by the dynamics of wavelike solutions of those equa- 
tions. In this section we clarify what is meant by the attribute “wavelike” . 
The following definition is basic. 

Definition 2.2.1. An approximate-plane- wave family is a one-parameter 
family of antisymmetric second rank tensor fields of the form 

Fob{a, x) = Re{e‘®W/“ U{a, x ) } (2.33) 

with the following properties. 

(a) The coordinates x = {x^,x^,x^^x'^) range over some open subset of the 
spacetime manifold and the parameter a ranges over the strictly positive 
real numbers, a € E"*". 
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(b) S is a real-valued function whose gradient has no zeros, i.e., 

dS{x) = (diS{x),d 2 S{x),d 3 S{x),d 4 S{x)^ ^ (0,0, 0,0) (2.34) 

for all X in the neighborhood considered. We refer to S as to the eikonal 

function of the approximate-plane-wave family. 

(c) For each a G R**", fab{oi, ') is a complex-valued antisymmetric second 

rank tensor field. Moreover, fab admits a Taylor expansion of the form 

iVo+l 

fab{ot,x) = ^ OL^fabix:) + 0(a^°+^) (2.35) 

N=0 

for all integers Nq > —1, where 

/ii(x) = m Urn ■ (2-36) 

We refer to as to the order amplitude of the approximate-plane- 

wave family. 

(d) For all x in the neighborhood considered, 

W(,W)5^0. (2.37) 

In (2.33), i denotes, of course, the imaginary unit, = -1, and Re denotes 
the real part of a complex number. 

We call S the “eikonal function” because an approximate-plane- wave fam- 
ily satisfies Maxwell’s equations in an asymptotic sense to be discussed later 
only if S satisfies a partial differential equation which is known as the eikonal 
equation. The term “eikonal”, which was introduced in 1895 by Bruns [23] 
in a more special context, is derived from the greek word eikon which means 
“image”. T his terminology is, indeed, justified since the eikonal equation is 
the fundamental equation of ray optics; so it governs, in particular, the ray 
optical laws of image formation. 

According to our general stipulation that all maps and tensor fields are 
tacitly assumed to be infinitely often differentiable it goes without saying 
that fab{o(,x) is a C°° function of a € R"*". A Taylor expansion of the form 
(2.35) is valid if and only if this function admits a C°° extension into the 
point a = 0. Note that we do not assume that the term in (2.35) 

goes to zero for iV oo, i.e., we do not assume analyticity with respect to 
a. 

It is important to realize that an approximate-plane-wave family cannot 
converge for o: 0. This is an immediate consequence of the following lemma 

which will often be used in the following. 

Lemma 2.2.1. Let S be the eikonal function of an approximate-plane-wave 
family, according to Definition 2.2.1 (b). Let u be a complex valued function 
defined on the same open subset of spacetime as the approximate-plane-wave 
family. (As always in Part I, we tacitly assume that u is of class C°° and, 
thus, continuous). Then lim Re{e*'^/“u} exists pointwise only if u = 0. 
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Proof. If u is different from zero at some point, it is different from zero, by 
continuity, on a whole neighborhood. For almost all points x of this neigh- 
borhood, (2.34) implies that S{x) ^ 0 and the limit does not exist. □ 

We are now going to justify the name “approximate-plane-wave family” . 
(More fully, (2.33) should be called a “locally-approximate-plane-and-mono- 
chromatic-wave family”. This terminology, however, seems a little bit too 
cumbersome.) The physical idea behind Definition 2.2.1 becomes clear if we 
consider the special case that the tensor fields daS and /a6(o!, • ) are covari- 
antly constant (and non-zero, as assured by (2.34) and (2.37)), i.e., that the 
equations VbdaS = 0 and Vcfab{<x, •) = 0 are satisfied. Then (2.33) gives 
a one-parameter family of monochromatic plane waves. With respect to an 
inertial system (i.e., a covariantly constant time-like vector field V® with 
9 abV°'V^ — —l), the frequency of such a wave is given by u) = ^V°' daS and 
the spatial wave covector is given by ka = ^daS — cv gab Hence, the limit 
Q! — > 0 corresponds to infinitely high frequency with respect to all inertial 
systems V® with V°‘ daS ^ 0. 

Now this is a very special case since on a spacetime without symme- 
try there are no non-zero covariantly constant vector fields. Therefore, as we 
want to work with ansatz (2.33) on an arbitrary spacetime, we cannot assume 
that daS and /a6(o:, •) are covariantly constant. However, if we restrict our 
consideration to a sufficiently small neighborhood, daS and /a6(o:, • ) deviate 
arbitrarily little from being covariantly constant. Similarly, on a sufficiently 
small neighborhood, any time-like vector field V° with gab V® V** = -1 devi- 
ates arbitrarily little from an inertial system. However small this neighbor- 
hood may be, by choosing a sufficiently small we can have arbitrarily many 
wave periods in this small spacetime region. 

This reasoning justifies the terminology introduced in Definition 2.2.1. 
Please note that (2.34) and (2.37) are essential to guarantee that (2.33) gives 
an approximately plane and monochromatic wave near each point for a suf- 
ficiently close to zero. 

In correspondence with this interpretation we shall refer to the hyper- 
surfaces S = const, as to the wave surfaces of our approximate-plane-wave 
family. The alternative terms eikonal surfaces and phase surfaces are also 
common. Moreover, we call 

"(a. ^) = -i y“(x) (2.38) 

its frequency function and we call 

ka{a, x) = ^daS{x) - w{a, x) gab{x) V^{x) (2.39) 

its spatial wave covector field with respect to the observer field V®; here all 
those time-like vector fields with gahV^’V^ = “1 are admitted for which 
F® daS has no zeros. 
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It is worthwile to note that from an approximate-plane-wave family (2.33) 
we can produce non-monochromatic waves by integrating over a with an 
appropriate density function w, 

rotz 

Fab{x)= Fab{a,x)w{a)da . (2.40) 

J Ot\ 

This can be viewed as a generalized Fourier synthesis. Here we have to assume 
that CKi < Q! 2 , with 0:2 sufficiently small to justify the approximate-plane- 
wave interpretation. Moreover, it is also possible to form superpositions of 
approximate-plane- wave families with different eikonal functions S. 



2.3 Asymptotic solutions of Maxwell’s equations 



To study the dynamics of wavelike electromagnetic fields in our medium 
we have to plug our approximate-plane-wave ansatz (2.33) into Maxwell’s 
equations, i.e., into (2.2) supplemented with our constitutive equations. Un- 
fortunately, only in very special cases is it possible to determine the eikonal 
function S and the amplitudes in such a way that the resulting equations 
are exactly satisfied for some o: € E+. It is the characteristic feature of the 
ray method to determine S and in such a way that Maxwell’s equations 
are satisfied, rather than for some finite value of a, asymptotically for a — > 0. 
In this way the ray method gives us the dynamics of wave surfaces and wave 
amplitudes in the high frequency limit. To put this rigorously we introduce 
the following notation. 

Definition 2.3.1. For N eZ, an approximate-plane-wave family Fab{a, • ) 
in the sense 0 / Definition 2.2.1 is called an order asymptotic solution of 
Maxwell’s equations if 



lim(^ 17“"“' acFedCa, ■)) = 0. 

lim (3r ■ ))) = 0 ■ 



(2.41) 



Here Ge/(a, •) is related to Fab{ot, •) by the constitutive equations of the 
medium. 

In (2.41), the limits are meant to be performed pointwise with respect 
to the spacetime coordinates; we shall restrict ourselves to neighborhoods on 
which the convergence is uniform. For the evaluation of (2.41), the following 
two observations are crucial. 

(a) The metric is independent of a and so are the other tensor fields that 
enter into the constitutive equations, i.e., 17“ , and fi^. Hence, the 
special form in which a enters into the approximate-plane-wave ansatz 
(2.33) together with the linearity of the constitutive equations implies 
that (2.41) is trivially satisfied for N < —1. 
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(b) If (2.41) holds for N = A/q, then it holds all the more for N < Nq. 

These two observations together suggest that Nq^ order asymptotic solutions 
can be found for arbitrarily large iVb > 0 by first evaluating (2.41) for N = —1 
and then proceeding step by step up to iV = ATq. We shall see in the following 
that this inductive procedure gives us dynamical laws for the eikonal function 
S and, step by step, for the amplitudes fK up to arbitrarily large order 

N = Nq. 

If we want to get dynamical laws for for all JV e N, we have to assume 
that our approximate-plane-wave family (2.33) satisfies (2.41) for all AT e N 
or, what is the same, for all A/" € Z. In this case Fab (a, • ) is called an infinite 
asymptotic series solution of Maxwell’s equations. 




Fig. 2.1. For > 0, an order asymptotic solution Fab{oc, • ) of Maxwell’s equa- 
tions approaches the space of exact solutions of Maxwell’s equations asymptotically 
for O' 0, as will be proven in Sect. 2.7. 



(2.41) does, of course, not imply that the one-parameter family Fab{oi, • ) 
converges pointwise (or in any other sense) towards an exact solution of 
Maxwell’s equations for a ^ 0. We have already emphasized that for an 
approximate-plane-wave family the limit lim Fab{oc, • ) cannot exist. This 

a — »0 

raises the question of whether asymptotic solutions can be viewed as approx- 
imate solutions. This question will be answered in Sect. 2.7 below by proving 
the following result. Let Fab (a, • ) be an approximate-plane- wave family that 
is an AT*^ order asymptotic solution of Maxwell’s equations in a linear di- 
electric and permeable medium for some AT > 0. Then there exists, locally 
around any one point, a one-parameter family F*^(a, • ) of exact solutions 
of Maxwell’s equations such that F*j,(o:, • ) — Fab{oc, • ) goes to zero in the 
pointwise sense (and even with respect to some finer norms involving arbi- 
trarily high derivatives) as for a — >■ 0. In other words, for a sufficiently 

small the members of our approximate-plane-wave family can be viewed as 
arbitrarily good approximations to exact solutions of Maxwell’s equations. 
Figure 2.1 illustrates this situation in the infinite-dimensional space of (C°°) 
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antisymmetric second-rank tensor fields defined on some open spacetime do- 
main. 



2.4 Derivation of the eikonal equation 
and transport equations 

In this section we derive, in a linear dielectric and permeable medium, the 
dynamical equations for wave surfaces and for wave amplitudes in the high 
frequency limit. We do that locally around any spacetime point a;o. As a 
preparation, we prove the following fact. 

Proposition 2.4.1. Consider an approximate-plane-wave family Fab{oc, •) 
that is an order asymptotic solution of Maxwell’s equations in a lin- 
ear dielectric and permeable medium for some N > —1. Then the frequency 
function (2.38) of Fab(oi, ■ ) with respect to the rest system of the medium 
(y“ = U°’) has no zeros. 

Proof. We introduce, around any spacetime point Xq, a coordinate system 
adapted to 17“ in the sense of Definition 2.1.3. We are done if we can show 
that 848 is different from zero at Xo- By assumption, our approximate-plane- 
wave family satisfies (2.41) for N = —1, i.e. 

t»^di,Sf^ = 0, (2.42) 

r,‘^dtSr,jU% = 0, (2.43) 

where g°^ is related to by the constitutive equations. (Here we made use 
of Lemma 2.2.1.) Now let us assume that 848 = 0 at Xq. At this point, the 
a = 4 component of (2.42) implies 

gf^^8^8b^^ = 0 (2.44) 

for the magnetic part 6® of /°j,, whereas the a = p components of (2.42) imply 

d,Sel-d,.Sel = 0 (2.45) 

for the electric part of Similarly, (2.43) results in 

g“-'d^SdP, = 0, (2.46) 

and 

8,8 hi -8^8 hi = 0 (2.47) 

for the electric part dj and for the magnetic part h? of gj^. Note that 8 is 
real whereas the amplitudes are complex. (2.45) and (2.46) imply 

g^^-eld^8,8==0. 



(2.48) 
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Similarly, (2.44) and (2.47) imply 

fl'‘^h“6?9„S = 0. (2.49) 

Recall that we are at a point where 848 = 0. Thus, condition (2.34) re- 
quires {818,828,838) ^ (0,0,0). Hence, by condition (c) of Definition 2.1.1, 
(2.48) implies that (e?,e§,e§) = (0,0,0) and (2.49) implies that {bi,b^,b^) = 
(0, 0, 0). This shows that our hypothesis of 848 having a zero gives a contra- 
diction to (2.37). D 

To analyze the dynamics of wave surfaces and amplitudes in the high 
frequency limit near an arbitrary spacetime point xq, we introduce near xq a 
coordinate system which is adapted to the rest system of the medium in 
the sense of Definition 2.1.3. We can then express electromagnetic fields in 
terms of the dynamical variables Zi,Z2,Z3,Y\,Y2,Y3 introduced in (2.22). 
Then any approximate-plane- wave family takes the form 

(S;3) = I? (?S) + 

for any integer Nq > -1. Here the complex amplitudes from (2.35) are ex- 
pressed in terms of C^-valued functions and . The following proposition 
gives necessary and sufficient conditions on the eikonal function 8 and on the 
amplitudes such that (2.50) is an asymptotic solution of Maxwell’s 

equations. 

Proposition 2.4.2. Consider, locally around any spacetime point Xq, a co- 
ordinate system (x^a;^,a;^,a;^) adapted to the rest system C/“ of a linear 
dielectric and permeable medium. Then an approximate-plane-wave family, 
represented in this coordinate system in the form (2.50), is an asymptotic 
solution of Maxwell’s equations in lowest non-trivial order N = -1 if and 
only if 848 has no zeros and 




For No > 0, such an approximate-plane-wave family is an order asymp- 
totic solution of Maxwell’s equations if and only if, in addition, 

(L“a„ + M) (^n) = -idaSL^ (^"+ 1 ) (2.52) 

for 0 < N < Nq. Here and M denote the same matrices as in the 
evolution equation (2.23). 

Proof In our adapted coordinate system, we decompose the asymptotic 
Maxwell’s equations (2.41) into constraint part (a = 4) and evolution part 
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(a — p). If these equations are satisfied by an approximate-plane- wave family 
for some N > —1, Proposition 2.4.1 implies that ^45 has no zeros. Under 
this condition the evolution part of (2.41) alone already implies the con- 
straint part of (2.41). This is easy to verify using the fact that, as outlined in 
Sect. 2.1, the evolution equations preserve the constraints. In other words, we 
can forget about the constraints and concentrate on evaluating the evolution 
part of (2.41). According to (2.23), this takes the form 

lim ( ^ (Z» + M) (^j“; ; 5) ) = (°) (2.53) 

in terms of the variables Zi,Z 2 ,Zs, YuY 2 , Y^. Hence, our approximate-plane- 
wave family is an asymptotic solution of Maxwell’s equations to lowest non- 
trivial order iV = — 1 if and only if ^4^ has no zeros and (2.53) is satisfied 
for JV = — 1. By feeding (2.50) into (2.53) for iV = — 1 we see that the latter 
condition is equivalent to (2.51), owing to Lemma 2.2.1. 

For iVo > 0, our approximate-plane-wave family is an Nq^ order solution 
if and only if in addition (2.53) is satisfied for all 0 < JV < ATq. Upon feeding 
(2.50) into (2.53), it is easy to prove by induction over JV that this is true if 
and only if (2.52) is satisfied for 0 < JV < JVq. n 

Condition (d) of Definition 2.2.1 requires that, if (2.50) represents an 
approximate-plane-wave family, and do not vanish simultaneously. 
Clearly, such a solution of (2.51) exists if and only if 

det(a„5L“) = 0. (2.54) 

This is a first order partial differential equation for S, homogeneous of degree 
six with respect to the components of the gradient of 5. If 5 satisfies (2.54) 
and if 84,8 has no zeros, S is called a solution of the eikonal equation of the 
linear dielectric and permeable medium considered. By Proposition 2.4.2, this 
is a necessary and sufficient condition for S to be the eikonal function of an 
approximate-plane-wave family that satisfies Maxwell’s equations asymptot- 
ically to order JV = -1 at least. In the theory of partial differential equations 
(2.54) is called the characteristic equation of the system of evolution equa- 
tions (2.23). 

In the next section we discuss the eikonal equation in our medium in more 
detail. In particular, we free ourselves from the special coordinates used so 
far. 

If we have a solution S of the eikonal equation. Proposition 2.4.2 can be 
used to construct an asymptotic solution of arbitrarily high order. To that end 
the amphtudes z^ and y^ have to be determined inductively with the help 
of (2.51) and (2.52). Clearly, and are not uniquely determined 

through z^ and y^ since, for a solution of the eikonal equation, daS 
has a non-trivial kernel. Let Ps{x) denote the 6 x 6 matrix that projects 
orthogonally onto the kernel of daS[x) L°‘{x), where “orthogonally” refers to 




22 



2. Light propagation in linear dielectric and permeable media 



the scalar product (2.29). For any solution S of (2.54) the rank of Ps{x) is 
bigger than or equal to one. We shall prove later that, owing to the special 
form of the matrices L°'{x)^ the rank of Ps{x) cannot be bigger than two. In 
general, the rank depends, of course, on x. 

Let us write 




This decomposition of the amplitudes and implies, via (2.50), a de- 
composition of Z and Y and thus, via (2.22), a decomposition of the electric 
and of the magnetic component of our approximate-plane-wave family. 

In terms of the decomposition (2.55), the inductive scheme for the ampli- 
tudes is given by the following proposition. 

Proposition 2.4.3. Let S he a solution of the eikonal equation and fix an 
integer Nq > 0. Then the one-parameter family (2.50) is an Nq^ order asym- 
ptotic solution of Maxwell’s equations if and only if the amplitudes and 
y^ satisfy 



and 




{l-Ps)(L“da + M) 




-idaSL“ 




(2.56) 



(2.57) 



Ps 9a (I) +PsM j = -Ps (£“ da + M) (2.68) 

for 0 < N < Nq. (2.56) is called the 0*^ order polarization condition, (2.57) 
is called the {N -I- 1)*^ order polarization condition and (2.58) is called the 
order transport equation. 

Proof (2.56) is obviously equivalent to (2.51). To prove that (2.57) and (2.58) 
together are equivalent to (2.52), we decompose (2.52) into two equations by 
applying Ps and 1 — Ps respectively. The first equation gives (2.57), the 
second equation gives (2.58). This is readily verified with the help of the 
equations daSV = 0 and daSPsL°' = 0. (The first equation is trivial 
and the second follows from the fact that daS L°' is symmetric with respect 
to the scalar product (2.29).) D 

Since (2.57) can be solved for z^~^^ and by this equation the ||- 

components of 2 :^+^ and are algebraically determined through the 
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lower order amplitudes and . This gives a restriction on the allowed 
directions of the electric and magnetic field vectors which justifies the name 
“polarization condition” . 

If and are known, (2.58) gives a system of first order differential 
equations for z^ and y^. Later we shall associate solutions of the eikonal 
equation with congruences of rays. The name “transport equation” refers to 
the fact that (2.58) gives us ordinary differential equations (i.e., “transport 
laws” ) for the components of z^ and y^ along each ray, as will be shown in 
Sect. 2.4 below. 

In spacetime regions where Ps has constant rank, (2.58) admits a well- 
posed initial value problem in the following sense. If 

1 < rank Ps = k = const. (2.59) 



we can choose k basis vector fields ai,...,afc (complex six-tuples depending 
on x), orthonormal with respect to the scalar product (2.29), such that 

Ps = ^ ttA 0 , (2.60) 

A=1 

where <g) denotes the standard tensor product on C®. Hence, z^ and y^ are 
of the form 

\y± j A=i 



with some C-valued functions Then the order transport equation 
(2.58) gives a system of k differential equations for the k coefficients which 
is symmetric hyperbolic. (This follows from the facts that each matrix is 
symmetric with respect to the scalar product (2.29) and that is close to 
1.) Hence, local existence and uniqueness of solutions ..., is guaranteed 
for arbitrary initial values given on a hypersurface = const. By solving 
the transport equations in this way at each level AT, we determine that part 
of the polarization direction which is not fixed already by the polarization 
condition, and we determine the intensity of our approximate plane wave. 

Now it is clear how, for a solution S of the eikonal function that satis- 
fies the rank condition (2.59), the amplitudes z^ and y^ can be determined 
inductively to construct an order asymptotic solution of Maxwell’s equa- 
tions. 

1. The induction starts with setting zj = yjj = 0. 

2. The step of the induction, 0 < iV < No, is given by the following 
prescription. With and y^ known, determine z^ and y^ by solving 
(2.58) with arbitrary initial values. (The only restriction on the initial 
values is that z^ and inust not vanish simultaneously.) Then, deter- 
mine and with the help of (2.57). 
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The other amplitudes (i.e, N > Nq-\-1) and the 

( 9 (o;^b+ 2 ) chosen arbitrarily. (E.g., they could be set equal 

to zero.) Then (2.50) gives an approximate-plane-wave family that satisfies 
Maxwell’s equations asymptotically to order Nq. 

This construction can be carried through for arbitrarily large ATo, i.e., it 
can be used to construct (non-convergent) infinite asymptotic series solutions 
of Maxwell’s equations. In the very special case that the induction yields 
= 0 for some iV > 1 we can set and equal to zero for 
M > Is to get an approximate-plane-wave family that satisfies Maxwell’s 
equations exactly for all a € R'*". 

The results of this section show how to construct, locally around any 
spacetime point, an approximate-plane-wave family that satisfies Maxwell’s 
equations in a linear dielectric and permeable medium asymptotically to some 
order N > 0. The physical relevance of those one-parameter families is in 
the fact that they can be interpreted as approximate solutions of Maxwell’s 
equations as well. This will be proven in Sect. 2.7 below. Already now we 
emphasize that this is not true for asymptotic solutions of lowest non-trivial 
order N = —1. In other words, if it is our goal to set up a viable approximation 
scheme for exact Maxwell fields we have to consider approximate-plane-wave 
families that satisfy Maxwell’s equations asymptotically to order JV = 0 at 
least. In this order we get polarization conditions that fix ^||, ^/jj, Zp y^ and 
we get transport equations for 2 ;° and y^. This N = 0 theory is often called 
the geometric optics approximation of Maxwell fields. 
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In the preceding section we have derived the eikonal equation of our medium, 
locally around an arbitrarily chosen point, in a special coordinate system. It 
is now our goal to analyze the structure of this equation and, in particular, 
to rewrite the eikonal equation in covariant form. 

In a coordinate system adapted to the rest system of the medium, the 
eikonal equation was given by (2.54) supplemented with the condition that 
has no zeros. Clearly, the characteristic matrix L{x,p) = PaL°'{x) is a 
real 6x6 matrix, symmetric with respect to the scalar product (2.29). Hence, 
it has six real eigenvalues and the characteristic determinant det(pa ■^‘^(a:)) 
is the product of these eigenvalues. If we want to bring the eikonal equation 
in a more explicit form we have to determine these six eigenvalues. 

First we reduce this six-dimensional eigenvalue problem to a three- 
dimensional eigenvalue problem. To that end we introduce, for all x in the 
spacetime neighborhood considered and for all p — {pi,P 2 :P 3 iPa) € R'^, the 
real 3x3 matrix 



W{x,p) = 



y/-gu{x) 



{pAQ{x)+PpAf'{x)^ 



(2.62) 
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which, by (2.25), enters into the characteristic matrix according to 



PaL^^ix) =P4 



1 0 
0 1 



+ y/-9u{x) 



0 W{x,pf 
W{x,p) 0 



(2.63) 



The (strictly positive) factor ^-544(0:) was introduced in (2.62) for later 
convenience. Then the 3 x 3 matrix W{x,p)'^W{x,p) is obviously positive 
semidefinite and symmetric with respect to the scalar product (2.29). Hence, 
it has three real eigenvectors ui (x,p), U 2 {x, p) ,U 3 (x, p) which are orthonormal 
with respect to the scalar product (2.29), and the pertaining eigenvalues are 
real and non-negative. We denote these eigenvalues by hi{x,p)^, h 2 {x,p)'^, 
h 3 {x,p)^ with hA{x,p) > 0 for A = 1,2,3. Similarly, the 3 x 3 matrix 
W{x,p) W{x,p)'^ has three real eigenvectors vi{x,p), V 2 {x,p), V 3 {x,p) which 
are orthonormal with respect to the scalar product (2.29), and the pertaining 
eigenvalues are the same as for W{x,p)'^ W{x,p), i.e., 

W{x,p)'^W{x,p)ua{x,p) = hAix^p)"^ ua{x,p) , 

W{x,p)W(x,p)'^ va(x,p) = hA{x,pf va{x,p) , 
for A = 1, 2, 3. The bases of eigenvectors can be chosen in such a way that 



W{x,p)ua{x,p) = hA{x,p)vA{x,p) , 
W{x,p)'^va{x,p) = hA{x,p)uA{x,p) , 



(2.65) 



for A = 1,2,3. (In the non-degenerate case, i.e., if the eigenvalues hi(a;,p)^, 
/i2(a;,p)^, /is(x,p)^ are mutually different, the eigenvectors ua{x,p) and 
va{x,p) are unique up to sign and the equations (2.65) are automatically 
true up to sign.) These equations imply that the characteristic matrix (2.63) 
satisfies 



Pa L^'ix) 



( ua{x,p) 
\±va{x,p) 



(p4 ± y/~9u{^hA{x,p)^ 



ua{x,p) \ 
±va{x,p)J 



( 2 . 66 ) 



for A = 1,2,3. This equation gives us six (real) eigenvalues of the 6 x 6 
matrix PaL°’ and pertaining eigenvectors in terms of the eigenvalues and 
eigenvectors of the 3x3 matrices W{x,p)'^ W{x,p) and W(x,p) W(x,p)'^. 
As the characteristic determinant is the product of these six eigenvalues, the 
eikonal equation (2.54) takes the form 



n UdiS)^ + 944 • , 35)2) ^ 0 (2.67) 

A=1 ^ 

supplemented with the condition that d^S has no zeros. To get a more explicit 
form of the eikonal equation, we have to calculate the eigenvalues hA{x,p)^ of 
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the matrix W{x,p)'^ W{x,p). If we insert the general expressions (2.26) and 
(2.27) for the components of the matrices Q{x) and A^{x) into the definition 
(2.62) of W{x,p)^ we find that the components of the matrix R{x,p) = 
W{x,p)'^ W{x,p) are 

RJ {x, p) = R"\'"{x) Pa Pb (2.68) 



with 

R^\'^{x) = ^ vf{x) w \x) w^{x) vj{x) . (2.69) 

5^44 

The three eigenvalues hi{x,p)^, /i 2 (a;,p)^ and h 3 {x,p)^ of the matrix R{x,p) 
are then given by 

hl/2{x,pf = \R°'\‘^{x)paPb ± 

J (i R^/ix) - i R<^,<’(x) PaPtPcPd , (2.70) 

hsix.p)'^ = 0 . 

The appearance of the square root in (2.5) has the unpleasant consequence 
that hi and /12 might fail to be differentiable at some points even if all input 
functions are as tacitly assumed throughout Part I. In the following we 
assume that hi and /i 2 are C°° functions at all points with (pi,P 2 »P 3 >P 4 ) ^ 
( 0 , 0 , 0 , 0 ). 

The whole calculation was done around an arbitrarily chosen spacetime 
point xo, in a coordinate system adapted to the rest system C/“ of the medium. 
Prom Sect. 2.1 we know that such a coordinate system is unique, locally near 
a;o, to within coordinate transformations of the special form (2.18). If we 
perform such a coordinate change, viewing p = ipi,P 2 ^P 3 ,P 4 ) as canonical 
momentum coordinates conjugate to a; = {x^ , x"^ , x^ , x"^) which transform as 

Va<^P'a = ^Vb, (2.71) 

the components of the matrix R{x, p) = W {x, p)^ W (x, p) transform accord- 
ing to 

^V(^',P') = ^^V(x,l-), (2.72) 

as can be read from (2.68) and (2.69). (That is the reason why we intro- 
duced the factor ^/—gu in (2.62).) The eigenvalues of the matrix R{x,p) 
are, thus, invariant under coordinate transformations of the form (2.18), i.e., 
h'jiix'^p'Y = hAix^pY- In other words, hi and /12 are uniquely determined 
(global and invariant) functions on the cotangent bundle over spacetime. 
Hence, for ^ = 1 and A = 2, the function 
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Ha{x,p) = - y“(x) U\x)paPh) (2.73) 

is a uniquely determined (global and invariant) function on the cotangent 
bundle over spacetime. We refer to Hi and H 2 as to the partial Hamiltonians 
of our linear dielectric and permeable medium. The eikonal equation can then 
be formulated in the following way. 

Proposition 2.5.1. A real-valued junction S, defined on some open space- 
time region U, is a solution of the eikonal equation if and only if 

Hi {x, dS{x)) H 2 {x, dS(x)) = 0 (2.74) 

and dS{x) ^ (0, 0, 0, 0) for all x e U. Here Hi and H 2 denote the partial 
Hamiltonians introduced in (2.73). 

Proof 5 is a solution of the eikonal equation near any spacetime point if and 
only if, in adapted coordinates near this point, (2.67) holds and d^S has no 
zeros. Since, by (2.5), ha vanishes, this is true if and only if 

(fti( • , dsf + {h2{ ■ , dsf + 5^(945)^) = 0 (2.75) 

holds and d^S has no zeros. From (2.5) we read that hi{x,p) and h 2 {x,p) 
are non-zero at points (x,p) with p 4 = 0 but (pi,P 2 ,P 3 ) (OjOjO). (This 
follows from the fact that (v^^(x)) and (Wgf(x)) are invertible 3x3 matrices 
and that the kernel of the matrix {r}cr 4 ^^{x)p,j,) is exactly one-dimensional if 
(pi,P 2 ,P 3 ) ^ (0,0,0).) Hence, for a solution of (2.75) the condition 7 ^ 0 
is equivalent to dS ^ (0, 0, 0, 0). With the help of (2.73) we can rewrite (2.75) 
in the coordinate invariant form (2.74). Q 

We shall refer to the equations 

HA{x,dS{x))=0 (2.76) 

for A = 1 and A = 2 as to the partial eikonal equations. A solution of the 

eikonal equation has to satisfy at each point at least one of the two partial 
eikonal equations. In the terminology of classical mechanics, (2.76) is called 
the Hamilton- Jacobi equation of the Hamiltonian Ha- 
The set of all {x,p) with p ^ (0,0, 0,0) and 

Ha{x,p) = 0 (2.77) 

is called the A-branch of the characteristic variety and the equation (2.77) 
is called the A- dispersion relation of our medium. The following proposition 
gives some information on the geometry of the A-branch of the characteristic 
variety. 

Proposition 2.5.2. For A = 1 and A = 2, the partial Hamiltonian Ha 
introduced in (2.73) has the following properties. 
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(a) Ha is homogeneous of degree two with respect to the momentum coordi- 
nates, 

HA{x,tp) = t^HA{x,p) (2.78) 

for all real numbers t. 

(b) Ha satisfies the differential equation 

Va{x) i U\x)p, . (2.79) 

(c) At all points (x, p) with p ^ (0, 0, 0, 0) hut U^{x) Pb = 0 the partial Hamil- 
tonian is strictly positive, 



Ha{x,p)>Q. (2.80) 

In (b) and (c), denotes the rest system of the medium. 

Proof, (a) is obviously true in the special coordinates where hA{x,pY is 
given by (2.5). As a consequence, it is true in any coordinates since the 
conjugate momenta transform homogeneously according to (2.71). To prove 
(b), we read from (2.5) that, in the special coordinates used there, the mo- 
mentum coordinates enter into hA{x,p)^ only in terms of the combination 
gAA{x)P(T — 9A<r{x)pA- Thus, the coordinate invariant differential equation 
Ua-^{hA{x,pY) = 0 holds true. To prove (c) it suffices to verify from 
(2.5) that hA{x,p)‘^ is non-zero if, in the coordinates used there, P 4 = 0 but 
(pi,P 2 ,p 3 ) ^ (0,0,0). This follows from the fact that {v/{x)) and (iy/(x)) 
are invertible 3x3 matrices and that the kernel of the matrix {%a^{x)ph) 
is exactly one-dimensional if (pi,P2,P3) 7^ (OjOjO)- ^ 

By differentiating (2.78) with respect to t and setting t = 1 afterwards, 
part (a) of this proposition implies that Ha satisfies the equations 

P.^^^ = 2Ha(x,p), (2.81) 

Thus, the Hamiltonian Ha is similar to the quadratic form of metric tensor, 
H{x,p) = \g°'^{x)paPb, but with a metric tensor that depends not only on 
X but also homogeneously on p. Such generalized metrics are usually called 
Finsler metrics] we may thus say that each partial Hamiltonian Ha defines a 
Finsler metric on the cotangent bundle over spacetime. Note, however, that 
some authors include the assumption of positive-definiteness into the defi- 
nition of the term “Finsler metric”, whereas our metric (d'^HA{x,p)/dpadb) 
cannot be positive definite. This follows from differentiating (2.79) with re- 
spect to Pb which demonstrates that UaUbd^HA{x,p)/dpadb < 0. For litera- 
ture on Finsler structures we refer to Rund [124] and to Asanov [10] . 
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From part (b) and (c) of Proposition 2.5.2 we read that 

(ff 

on the A-branch of the characteristic variety. Hence, this branch is a codi- 
mension-one submanifold of the cotangent bundle which is transverse to the 
fibers. By part (a) of Proposition 2.5.2, the intersection of this manifold with 
each fiber has a “conic” structure. 

In general, the union of the 1-branch and of the 2-branch of the charac- 
teristic variety need not be a manifold. The two branches might intersect or 
coalesce. It is, of course, also possible that the two branches coincide com- 
pletely. (This is necessarily true if the medium is isotropic, as we shall verify 
soon.) Whenever the two branches do not coincide, the medium is called 
hirefringent or double-refractive. 

The fact that the two branches can intersect or coalesce is related to the 
following unpleasant feature. Whereas (2.83) guarantees that either partial 
eikonal equation (2.76) can be solved, locally around any one point, for one 
of the partial derivatives diS, 828 , 848 , this is not necessarily true 

for the full eikonal equation (2.74). Hence it is not guaranteed that we can 
find a hypersurface through each point such that initial data for 8 on that 
hypersurface determine a solution of (2.74) uniquely. 

The term “birefringence” refers to the fact that a light wave that enters 
into such a medium from vacuum will split up, in general, into two waves. In 
the appproximate-plane-wave setting considered here, one of the two waves 
has an eikonal function that solves (2.76) with A = 1 , the other one with 

= 2. In general, a solution of the full eikonal equation (2.74) can satisfy 
(2.76) with A = 1 at some points and with A = 2 at other points. Moreover, 
there might be solutions of the full eikonal equation which solve (2.74) with 
A = 1 and with A — 2 simultaneously. If the two branches of the characteristic 
variety coincide, this is true for all solutions of the full eikonal equation. 

It is worthwile to note that the partial Hamiltonians can be changed 
according to 

Ha{x,p) I — > Ha{x,p) = Fa{x,p) Ha{x,p) (2.84) 

for A = 1,2, where Fa is any real-valued function that has no zeros on the 
.A-branch of the characteristic variety. Clearly, such a transformation does 
not affect the solutions of the partial eikonal equations. In this sense, the 
dynamics of wave surfaces in our medium determines two equivalence classes 
of Hamiltonians. A transformation of the form (2.84) does, of course, not 
preserve the degree-two homogeneity of Ha with respect to the momentum 
coordinates. Thus, it will lead to a representation in which the Finsler struc- 
ture is “hidden” . 

Finally, we illustrate the results of this section by considering two more 
special kinds of media. 
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Example 2.5.1. The Isotropic Case 

If our linear dielectric and permeable medium is isotropic in the sense of 
Definition 2.1.2, the two branches of the characteristic variety coincide and 
are given by the null cone bundle of a Lorentzian metric. In particular, there 
is no birefringence in an isotropic medium. To verify these well-known facts 
with the help of our general results, we first observe that, in the isotropic 
case, (2.21) simplifies to 

vj = and wj = ;^ • (2-85) 



Upon inserting this into (2.69) and using the identity (2.7) of the Levi-Civita 
tensor field, the non-zero eigenvalues in (2.5) take the form 



.2 _ u „^2 _ g‘\x) + U'‘(x)U\x) 
£{x)ll{x) 



hi{x,pY = h2{x,py 



PaPb 



Thus, the partial Hamiltonians (2.73) coincide and are given by 



H{x,p) = Hi{x,p) = H 2 {x,p) = igf{x)paPb , 



where 



gf = i- U^) - 

SjJL 



( 2 . 86 ) 

(2.87) 

( 2 . 88 ) 



are the contravariant components of a Lorentzian metric which is called the 
optical metric of the isotropic medium. Please note that UaUf, = —1 
and that UaXb = 0 implies g^^ XaXb = ^ g°'^ Xa X^. Both observations 
together imply that the optical metric is, indeed, of Lorentzian signature 

( + 5 +> +? “)• 

The strictly positive function 



n = y/sjl (2.89) 

is called the index of refraction of the isotropic medium. If n = 1 (and, in 
particular, in the vacuum case e = p = 1) the optical metric and the spaoe- 
time metric coincide, g^^ = g°‘^. Hence, the eikonal equation in an isotropic 
medium has exactly the same structure as in vacuum; we just have to replace 
the spacetime metric with the optical metric. This is a well-known result. It 
was derived, with increasing mathematical rigor, by Gordon [50], Pham Mau 
Quan [117] and Ehlers [38]. 

Example 2.5.2. The Uniaxial Case 

Now we specialize from a general linear dielectric and permeable medium to 
the case that the permeability tensor field has the same form as for vacuum, 
^ab _ gab Moreover, we assume that the dielectricity tensor field, 

which can be written in the form = Si Xf X^ -t- £2 Xf X2 4- £3 X3 X| with 
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QabX^X^ = Spa, QabX^U^ = 0 and > 0 for O’ = 1,2,3, has a double 
eigenvalue, say £\ = £ 2 - In this case the functions hi{x,p)‘^ and h 2 {x,p)‘^ of 
(2.5) are bilinear with respect to the momentum coordinates. An example for 
such a medium is a uniaxial crystal. For the partial Hamiltonians (2.73) we 
find in this special case after a quick calculation 

Ha{x, p) = I g^Ai^) Pa Pb (2.90) 

for A = 1 and A = 2, where 



g^b ^ i_ /gab ^ ija ijb^ 

^1 

Soi = - + XJ X^) + - XI X| - £7“ !7‘ . 

£3 ^1 



(2.91) 



Hence, either branch of the characteristic variety is the null cone bundle of 
a Lorentzian metric. Generalizing the terminology from the isotropic case, 
the two metrics (2.91) can be called the optical metrics of the medium. The 
first optical metric does not distinguish a spatial direction, i.e., it is of the 
same kind as the optical metric in an isotropic medium. The second optical 
metric, however, reflects the fact that the Xs-direction is distinguished in the 
medium considered. In a situation like this the 1-branch of the characteristic 
variety is usually called the ordinary branch whereas the 2-branch is called 
the extraordinary branch. In this terminology solutions of the partial eikonal 
equation (2.76) with A = 1 are associated with ordinary waves and solutions 
with A = 2 are associated with extraordinary waves. 



If the eigenvalues £ 1 , £ 2 , £3 of the dielectricity tensor field are mutually 
different, the two characteristic varieties are no longer the null cone bundles of 
Lorentzian metrics. An example for such a medium is a biaxial crystal. If we 
want to speak of “optical metrics” in such a medium, we have to understand 
the term “metric” in the Finslerian sense. Both these optical Finsler metrics 
display the anisotropy of the medium in a symmetrical way, i.e., it is not 
justified to distinguish one of them by the attribute “ordinary”. For this 
reason, we prefer to speak of 1-waves and 2-waves (rather than of ordinary 
waves and extraordinary waves) in general anisotropic media. 



2.6 Discussion of transport equations and the 
introduction of rays 

In this section we associate solutions of the eikonal equation in a linear di- 
electric and permeable medium with congruences of rays. The guiding idea 
is to introduce the notion of rays in such a way that the transport equations 
(2.58) can be reinterpreted as ordinary differential equations along rays. 
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According to Proposition 2.5.1 the left-hand side of the eikonal equation 
has a product structure. This suggests to introduce, for solutions S of the 
eikonal equation, the following terminology. S is called a solution of multiplic- 
ity two iff it satisfies both partial eikonal equations (2.76), and it is called a 
solution of multiplicity one iff it satisfies exactly one of the two partial eikonal 
equations. The multiplicity can, of course, change from point to point. 

In the following we consider solutions of the eikonal equation on neighbor- 
hoods where the multiplicity is constant. Note that, for any given solution, 
there exists such a neighborhood near almost all spacetime points. We begin 
our discussion with solutions of multiplicity one, later we consider the some- 
what more complicated case of solutions of multiplicity two. We introduce 
the following definition. 

Definition 2.6.1. Let S be a solution of the eikonal equation according to 
Proposition 2.5.1. Assume that, on some open spacetime region U, S is of 
multiplicity one, i.e., that the partial eikonal equation (2.76) is satisfied for 
A = 1, say, but not for A = 2 at all points ofU. Then the vector field 

K'‘(x) = ^{x,as{x)) (2.92) 

on U is called the transport vector field and its integral curves are called the 
rays associated with S. 

In the theory of partial differential equations the rays are also known as 
(bi-) characteristic curves. 

Owing to (2.83) the transport vector field has no zeros, i.e., the rays 
are immersed curves. Changing the partial Hamiltonian according to (2.84) 
corresponds to reparametrizing the rays. Please note that 

daS{x)K^{x) = 0. (2.93) 

This follows from the fact that Hi satisfies equation (2.81) which was a 
consequence of the homogeneity property established in Proposition 2.5.2 (a). 
Thus, the transport vector field is tangent to the hypersurfaces S = const. 

We want to show now that the transport equations can be viewed as 
ordinary differential equations along rays. We do that locally around any 
point xq of the neighborhood U mentioned in Definition 2.6.1. To that end we 
introduce a coordinate system adapted to the rest system [/“ of the medium 
near xq. (Please recall Definition 2.1.3.) In such a coordinate system, the 
partial Hamiltonians (2.73) take the form 

Ha{x,p) = i [hA{x,p) + ;y=p) {hA(x,p) - ’ (2-94) 

i.e., the partial eikonal equations (2.76) are equivalent to 
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hA{x,dS{x)) ± 

V“S'44(aj) 



(2.95) 



for A = 1, 2. Here and in the following, the upper sign corresponds to negative 
frequency solutions, < 0, and the lower sign corresponds to positive 
frequency solutions, > 0. With (2.95) the partial derivative of Ea takes 
the form 



QEa 

dPa 



(x, dS{x)) 



^(x,as{x)) - \- ) . (2.96) 

y/-gu{x) ^ ^Pa ^y-9u{x) ^ 



With these informations at hand, we take a closer look at the order 
transport equation, i.e., at equation (2.58) viewed as a differential equation 
for and with and assumed known. In the situation of Defini- 
tion 2.6.1, the projection operator Ps{x) onto the kernel of daS{x) L°~{x) is 
given, in terms of the eigenvectors ua and va of (2.66), by 



/ ui{x,dS{x)) \ / ui{x,dS{x)) \ 

\±vi{x,dS{x))j ^ \±vi{x,dS{x))j 



(2.97) 



and z^ and y^ are necessarily of the form 

^ ^ ’\±vi{x,dS{x))J 



(2.98) 



with a C-valued function After multiplication with the non-vanishing 
factor d^Six) ! gu{x)^ the order transport equation (2.58) reduces to a 

differential equation for the function of the form 

K^{x) + /(^) (2.99) 



Here AT® is an abbreviation for 

diSjx) ( ui{x,dS(x)) \ 
^ g44{x) \±Vi{x,dS{x)) ) 



L‘(x) ( ) 

^ (±«i(x,8S(x)) ) 



( 2 . 100 ) 



and f{x) and k^{x) are known C-valued functions. Clearly, (2.99) gives an 
ordinary differential equation for along each integral curve of the vector 
field K^. We now show that K® is, indeed, the transport vector field defined 
through (2.92). We first observe that (2.66) implies 



/ ua{x,p) 
\±va{x,p) 



ub{x,p) 

±vb{x,p) 



PaL^ix) 

± y/-g44{x) hA{x,p)^ 5ab 



( 2 . 101 ) 
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for A,B = 1, 2 . Upon differentiation with respect to pb, (2.101) yields 




((^4 ± y/-g44{x) ^^{x,p)^5ab ■ ( 2 . 102 ) 

If we evaluate this equation with A = B = 1 along p = dS{x), we see that 
the right-hand side of (2.100) coincides with the right-hand side of (2.96) for 
.4 = 1. This proves that the vector field in (2.99) is, indeed, the transport 
vector field associated with S. 

Now let us investigate to what extent these results carry over to the case 
of solutions of multiplicity two. In analogy to Definition 2.6.1, we introduce 
the following notions. 

Definition 2.6.2. Let S be a solution of the eikonal equation according to 
Proposition 2.5.1. Assume that, on some open spacetime region U, S is of 
multiplicity two, i.e., that the partial eikonal equation (2.76) is satisfied for 
A = 1 and A = 2 at all points of U. Then for .4 = 1 and A = 2 the vector 
field 

Ki(x)^^(x,dS(x)) (2,103) 

on U is called the A-transport vector field and its integral curves are called 
the A-rays associated with S. We shall also refer to Ki{x) and K^ix) as to 
the partial transport vector fields associated with S. 

For a solution of multiplicity two, Kf and JFCf may or may not coincide. 
(If they are collinear, they can be made equal by a transformation of the 
form (2.84).) If the two branches of the characteristic variety coincide, all 
solutions are of multiplicity two with Kf — This is the case for an 
isotropic medium where, by (2.87), 

iff(x) = K^{x) = gf(x) dtS{x) . (2.104) 

Hence, there is only one congruence of rays associated with each solution of 
the eikonal equation in an isotropic medium. 

In the anisotropic case we have to live with the situation that solutions 
of the eikonal equation might be associated with two different congruences 
of rays. Clearly, this makes it more complicated to interpret the transport 
equations as ordinary differential equations along rays. We are now going to 
work out the details. 

In the situation of Definition 2.6.2, (2.97) is to be replaced with 
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Ps{x) = 



2 



E 



/ ua{x,p) 



/ ua{x,p) 
\±va{x,p) 



and and are of the form 



^ V .AT/ . ( UA{x,dS(x)) \ 
y±i^)) \±VA{x,dS{x))j 



(2.105) 



(2.106) 



with two C-valued functions and to be determined. Upon multipli- 
cation with the non-vanishing factor d 4 ^S{x)/gu{x), the transport equation 
(2.58) gives a system of two coupled differential equations for and ^2 of 
the form 



K%(x)d,i1{x) + f^fAB{^)iS{:^) + k^{x) = 0, A =1,2. (2.107) 



B=1 



Here is an abbreviation for 

d4^( UA{x,dS(x)) \ 

544(2;) \^±UA(rc,a 5 (x))y 



L^{x) 



I UA{x,dS{x)) \ 
y±UA(a;,a5(x)) ) 



(2.108) 



for = 1, 2 and /ab, ^a known C-valued functions. To put the transport 
equation into the form (2.107), we made use of the fact that, by (2.6), our 
multiplicity-two solution satisfies 



f ui{x,dS{x)) \ 
\^± vi(x,dS{x))J 



L%x) 



f U 2 {x,dS{x)) \ _ 
\±V 2 (x,dS{x))J 



f U2{x,dS{x)) \ 
\±V 2 (x,dS{x))) 



■L^{x) 



( ui(x,dS{x)) \ 
\±vi{x,dS{x))J 



= 0 . 



(2.109) 



To verify that the K% given by (2.108) are, indeed, the partial transport 
vector fields defined in (2.103), we consider (2.6) with A = B. This shows 
that the right-hand side of (2.108) coincides with the right-hand side of (2.96). 

If the two partial transport vector fields coincide, (2.107) gives an ordinary 
differential equation for the tuple (Cf , along the integral curves of Kf = 

K^. In the general case, the situation is more complicated. (2.107) with A = 1 
gives an ordinary differential equation for along the integral curves of 
Kf that involves C 2 i and (2.107) with A = 2 gives an ordinary differential 
equation for along the integral curves of that involves Cf • 

We summarize our discussion in the following way. For a solution of the 
eikonal equation in a linear dielectric and permeable medium. Definition 2.6.1 
gives a transport vector field and, thus, a congruence of rays on open sub- 
sets on which the multiplicity is one, and Definition 2.6.2 gives two partial 
transport vector fields and, thus, two congruences of rays on open subsets on 
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which the multiplicity is two. What is left out is the set of all points where 
the multiplicity changes. By continuous extension into such point we might 
get pathologies such as bifurcating rays. 

Any ray is an integral curve of a vector field K% given by (2.103) with 
A = 1 and/or A = 2. For any such integral curve s \ — > x{s) we can define a 
map s I — > p{s) by p{s) = dS{x(s)), thereby getting a solution of Hamilton’s 
equations 



Ha(x{s),p{s)^ = 0 , 

We call any immersed curve s i — ^ x{s) for which (2.6) is satisfied, with some 
s I — ^ p(s), an A-ray for short, A = 1, 2. 

If the partial Hamiltonian Ha is changed into Ha by a transformation 
of the form (2.84), the A-rays undergo a reparametrization but they are 
unchanged otherwise. In other words, the A-rays are determined, up to their 
parametrization, by the A-branch of the characteristic variety. 

In the uniaxial case discussed in Example 2.5.2, the 1-rays are called ordi- 
nary rays and the 2-rays are called extraordinary rays . If we solve Hamilton’s 
equations (2.6) with the partial Hamiltonians given by (2.90), we find that 
the ordinary and extraordinary rays are the light-like geodesics of the first 
and the second optical metric (2.91), respectively. 

In the isotropic case there is only one optical metric and the notions of 1- 
rays and 2-rays coincide. By solving Hamilton’s equations (2.6) with Hi — H 2 
given by (2.87), we find that the rays are exactly the light-like geodesics of 
the optical metric. This implies, of course, in particular the familiar textbook 
result that in vacuum the rays are exactly the light-like geodesics of the 
spacetime metric. 



2.7 Ray optics as an approximation scheme 

Prom the preceding sections we know that rays are associated with asymptotic 
solutions of Maxwell’s equations. We are now going to show that they are 
associated, moreover, with approximate solutions of Maxwell’s equations. For 
the physical interpretation of ray optics this is a crucial point. 

Let us start with a solution S of the eikonal equation which, in a medium 
of the kind under consideration, is given by (2.74) with partial Hamiltonians 
(2.73). As always, we assume that S is given on some open neighborhood of 
spacetime and that its gradient has no zeros. Moreover, we have to assume in 
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the following that S is associated with a unique congruence of rays. In other 
words, we have to assume that either 5 is a solution of multiplicity one or 
that S' is a solution of multiplicity two for which the two partial transport 
vector fields coincide. 

It is our goal to associate such an eikonal function S with an approximate 
solution of Maxwell’s equations. To that end, we fix a spacetime point and, 
on a neighborhood of that point, a coordinate system adapted to the rest 
system of the medium in the sense of Definition 2.1.3. We use the inductive 
method of Sect. 2.5 to construct an order asymptotic solution 



fZ(a,x)\ _ 
\Yla,x)J 



N+l / M f \\ 'k 

Re[e^s(.)/o. ^ ( 2 . 111 ) 



of the evolution equations (2.23), where N can be chosen as large as we want. 
This leaves the 0(o:^+^) term arbitrary. For any choice of this term, (2.111) 
is automatically an order asymptotic solution of the constraints as well. 
It is not difficult to check that the 0{a^+^) term can be chosen in such 
a way that the constraints are exactly satisfied on the initial hypersurface 
x^ = const. These initial values determine a unique exact solution of the 
evolution equations (2.23) for each a, thereby giving us a one-parameter 
family of exact solutions that will be denoted by Z*{a, • ), Y*{a, • ). Now the 
difference 



AZ(a,^) = Z{a,-)-Z*ia,-), 
AY{a,-)=Y{a,-)-Y*{a,^), 



( 2 . 112 ) 



satisfies 



da + m) ;|) = 0(a"+‘) (2.113) 

and vanishes on the initial hypersurface. We have already stressed in Sect. 2.1 
that the differential operator on the left-hand side of (2.113) is symmetric 
hyperbolic with respect to the scalar product (2.29). Hence, we have the 
so-called energy inequalities at our disposal (see, e.g.. Theorem 4.3 in Chaz- 
arain and Piriou [26] or Theorem 2.63 in Egorov and Shubin [36]). As a 
consequence, (2.113) implies the existence of a constant C such that the 
inequality 

■ ? V (2.114) 

holds on an appropriately chosen (relatively compact) neighborhood. The 
constant C can be written as an integral over this spacetime neighborhood 
where the integrand involves the (known) tensor fields gab, 

Actually, the energy inequalities allow to estimate AZ and AY not only in 
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the pointwise sense as in (2.114) but even in terms of Sobolev norms involving 
arbitrarily high derivatives. For our purpose, however, (2.114) will do. 
(2.114) can be rewritten in terms of the field strengths 



(BA (EA 

B=\B 2 \ and F? = £^2 (2.115) 

\B^) \Ez) 

rather than in terms of our dynamical variables Z and Y. Since the constitu- 
tive equations are linear, and since the dielectricity and permeability tensor 
fields can be uniformly estimated on compact subsets of spacetime, we get 
an inequality of the form 



(AB{a, ^)\ 
\AE{a, ■)) 



(AB(a, .)\ 






(2.116) 



where C is another constant. This shows that for a sufficiently small B{a, • ) 
and E{a, • ) are arbitrarily close to the exact solution B*{ct, • ) and E*{a, • ), 
i.e., that our order asymptotic solution is indeed an approximate solution, 
recall Figure 2.1. The higher N, the faster AB{a, • ) and AE{a, ■ ) converge 
to zero for a — » 0. 

In physical terms, the possibility to measure electric and magnetic field 
strengths is limited by some measuring accuracy S. If a is so small that the 
right-hand side of (2.116) is smaller than 5^, (2.116) implies that an observer 
moving along an x‘^-line cannot distinguish, by way of measurement, the ap- 
proximate solution from the exact solution. It is important to realize that this 
is true only for observers moving along an ar^-line (or at a small velocity with 
respect to the a:‘^-lines). If we exclude the case that approximate solution and 
exact solution coincide, we can always find observers, moving at a high veloc- 
ity with respect to the x^-lines, who measure an arbitrarily large difference 
between them. This follows immediately from the transformation behavior of 
electric and magnetic field strength under a Lorentz transformation, given in 
any textbook on special relativity. In other words, the question of whether 
or not our order asymptotic solution, for some finite value of a, can be 
viewed as a valid approximation for some specific exact solution depends on 
the observer field with respect to which electric and magnetic field strengths 
are to be measured. 

A similar observation, based on a different argument, was brought forward 
by Mashhoon [92]) who only considered light propagation in vacuum. He 
came to the conclusion that the equations of general relativistic ray optics 
have a meaning only in the limit of infinite frequency but not in the sense of 
a physically reasonable approximation for any finite frequency value. We do 
not share this radical point of view. Our results show that the ray method 
does give a viable approximation scheme for light propagation in a medium 
of the kind under consideration in the following sense. Any solution S of 
the eikonal equation which is associated with a unique congruence of rays 
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can be viewed locally as the eikonal function of an approximate-plane-wave 
family that satisfies Maxwell’s equations asymptotically to order N. This 
was shown in Sect. 2.4 for arbitrary N > 0. Moreover, we can find a one- 
parameter family of exact solutions of Maxwell’s equations such that the 
difference between asymptotic solution and exact solution goes to zero for 
q: — > 0 like This follows from (2.114) or from the equivalent result 

(2.116). We just have to keep in mind that the constant C in (2.114) and the 
constant C in (2.116) depend on the observer field t/“; it is impossible to find 
error bounds that are valid with respect to all observer fields simultaneously. 

Having thus associated a congruence of rays with a one-parameter family 
of exact solutions of Maxwell’s equations •), it is natural to ask if 

the rays are related, at least in the sense of an approximation, to the energy 
flux of •). After all, the intuitive idea behind ray optics is to view 

light propagation as a sort of energy transfer along rays. We need some more 
calculations to prove that this idea is, indeed, correct for media of the kind 
under consideration. 

We start again with a solution S of the eikonal equation and assume 
that it is associated with a unique congruence of rays. We construct, lo- 
cally around any point as outlined above, an approximate-plane-wave family 
Fab{ot^ ' ) with eikonal function S and a one-parameter family of exact solu- 
tions F*fj(a, • ) such that (2.114) holds for iV = 0 at least. Then the energy 
flux of F*fj{a^ • ) in the rest system f/® of the medium is given by 

5*®(q!, •) = U^{x)T\°'{a,x) (2.117) 

where • ) is the Minkowski energy-momentum tensor of F*^{a, • ), 

T\“ia, ■ ) = F;,{a, ■ ) G*“(a, ■ ) - i (Sj” F^(c, • ) • ) , (2.118) 

The component of the energy flux four-vector (2.117) orthogonal to 17® gives 
the familiar Poynting vector^ whereas the component parallel to t/® gives the 
energy density of the electromagnetic field. 

In a coordinate system adapted to H®, in the sense of Definition 2.1.3, 
F*f^ and can be expressed in terms of our dynamical variables Z* and Y* 
as in (2.22). Then (2.117) takes the form 

V-9ii{x) S*“(a, ■ ) = g“'(x) ?;V W V W Zl{a, ■ ) Y‘ {a,-) - 

^Stg'’^x)[z:{a, ■)Z:(a, -) + Y:{a, ■)Y:(a, •)) . (2.119) 

Since (2.114) holds with N = 0, (2.119) can be rewritten in terms of our 
approximate-plane-wave family Zp{a, • ), Yp{a, • ) in the form 

V-P44(®)S*“(a, ■) = 9<'-'ix)n\'’{x)v,\x)w/ix)Zxia, -)Y4a, •) - 

^52g"{x][z,{a,-)ZAa, -) + Y4a, ■)Yr{a, ■))+0{a) . (2.120) 
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Since Zp{a, • ) and Yp{a^ • ) are given by (2.50) with Nq = 0, we have 



Z\{a, • ) Yp{a, • ) = 5 zl{x) y°(x) + 2:5(x) 2/°(rr)} + 0(a) . 

( 2 . 121 ) 



Let us denote by < / > (x) the average of a spacetime function / taken 
over a neighborhood of x on which the gradient of S and the amplitudes z^, 
can be viewed as approximately constant. (Please recall our discussion of 
approximate-plane-wave families in Sect. 2.2.) For a sufficiently small, the 
first term on the right-hand side of (2.121) gives an average arbitrarily close 
to zero. We may thus write 

< Zx{a, ■ ) Y^{a, • ) > « 1 Re{zS , (2.122) 



where x ^ y means that the difference between x and y can be made arbi- 
trarily small by choosing a sufficiently small. Similar expressions hold for the 
averaged products < Z\(a, •)Zp(a, •) > and < Y\(a, -)Yp(a, •) >. With 
these equations at hand, we can calculate the averaged energy flux from 
(2.119). If we assume that the background fields (i.e., the spacetime metric 
and the tensor fields that characterize the medium) do not vary significantly 
over the neighborhood used for the averaging procedure, we find 



-2 < s*^(a, • ) > « < 5 ? + 

(5| Re{z^ • Q y°} 4- (z° -z° + y^ • y°) 



(2.123) 



where the 3x3 matrices Q and from (2.26) and (2.27) are used. 

We shall now show that the right-hand side of (2.123) is, indeed, propor- 
tional to the transport vector field of our eikonal function. To that end, we 
recall that Z(a, ■),Y(a, • ) is an order asymptotic solution of Maxwell’s 
equations for iV = 0 at least. Thus, z® and y^ have to satisfy the 0^*^ order 
polarization condition (2.56). This implies 




= f(^) 



/ ui(x,dS(x)) \ 
\-vi(x,dS{x)) J 



with a C-valued function if S is of multiplicity one, and 




UA(x,dS(x)) \ 
-VA(x,dS{x)) J 



(2.124) 



(2.125) 



with C-valued functions and if is of multiplicity two. In the first case 
the transport vector field is given by (2.100), and (2.124) implies 




K°' = u 



(2.126) 
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with some E-valued function u. In the second case the partial transport vector 
fields are given by (2.108). Since we assume that our eikonal equation is 
associated with a unique congruence of rays, the two partial transport vector 
fields coincide, and (2.126) holds in this case as well. With 

L°‘ given by (2.25), (2.126) takes the form 

2if“ = «(«; Re{«“ • AV} + ■Qy'‘} + it (z” ■ 2 ° + ■ !/“)) . 

(2.127) 



Comparison of (2.123) and (2.127) shows that 

<5*“(a, (2.128) 

with some R-valued function u. In other words, the averaged energy fiux of the 
exact Maxwell field follows the rays up to terms that can be made arbitrarily 
small by choosing a sufficiently small. Please note that we have considered 
the energy fiux only in the rest system of the medium. This is important 
unless in the vacuum case where there is no distinguished rest system of the 
medium and (2.128) holds for the energy flux with respect to any observer 
field. 

With these findings we have completed our discussion of light propagation 
in a linear dielectric and permeable medium. In particular, we have now es- 
tablished the missing link between asymptotic solutions and approximate so- 
lutions. Let us emphasize the main point again. For the mathematical deriva- 
tion of eikonal equation and transport equations through a mathematically 
well-defined limit procedure it is necessary to consider approximate-plane- 
wave families that satisfy Maxwell’s equations asymptotically for a 0. 
Prom a physical point of view, however, this limit a 0 is a purely formal 
device. The physical meaning of the method is in the fact that the result- 
ing approximate-plane-wave families give approximate solutions of Maxwell’s 
equations for (sufficiently small but) finite values of a. 
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In Chap. 2 we considered the homogeneous Maxwell equations (2.1) supple- 
mented by linear constitutive equations (2.12). This ansatz does, of course, 
not cover all sorts of media with relevance to physics. Modifications of the 
following kind are possible. First, we could replace ansatz (2.12) with more 
complicated relations between field strengths and excitations. Second, we 
could introduce a current, i.e., a source term in the second Maxwell equation 
(and, similarly, in the first Maxwell equation if the hypothetical existence 
of magnetic monopoles is to be taken into account). In the latter case, the 
current must be specified by additional equations. E.g., we could assume, 
in analogy to (2.12), a linear relation between 3-current and electric field 
strength in the rest system of the medium, thereby generalizing Ohm’s law. 

For any such specification of the medium we can investigate if the resulting 
system of equations determines reasonable dynamics for the electromagnetic 
field. Here, a dynamical law is to be viewed as “reasonable” if it is governed by 
a set of evolution equations characterized by a local existence and uniqueness 
theorem. This set of evolution equations might be supplemented by a set 
of constraints that are preserved by the evolution equations. If the medium 
under consideration gives rise to a dynamical law of this kind, it is reasonable 
to proceed along the lines of Chap. 2, i.e., to consider approximate-plane-wave 
families (2.33) that satisfy evolution equations and constraints asymptotically 
for q: — > 0 to some order N. The passage to ray optics has been achieved if 
it is possible to derive, on this assumption, an eikonal equation of the form 
H (x, dS(x)) = 0, where H can be chosen as a product, H = H\ - Hk with 
each partial Hamiltonian Ha^ A = 1, ...,fc, satisfying condition (2.83) on its 
characteristic variety. This guarantees that each solution 5 of a partial eikonal 
equation Ha{x, dS{x)) = 0 is associated with a nowhere vanishing transport 
vector field (2.103) whose integral curves give a congruence of .4-rays, i.e., of 
solutions to Hamilton’s equations (2.6) projected to spacetime. 

Even if all this works out nicely, it is of course not guaranteed that ray 
optics in the medium under consideration can be viewed as a valid approx- 
imation scheme for exact Maxwell fields. This has to be checked, along the 
lines of Sect. 2.7, in each case individually. 
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We are now going to discuss the question of if and how such a treat- 
ment of media more general than considered in Chap. 2 is able to cover the 
phenomenon of dispersion. 



3.1 Methodological remarks on dispersive media 

The most important motivation to go beyond the kind of media considered in 
Chap. 2 is the following. We have found that the media considered in Chap. 2 
are characterized by an eikonal equation of the form H{x, 05(a:)) = 0 where 
the Hamiltonian H can be chosen homogeneous with respect to the momenta. 
In other words, if a 4-momentum p = (pi,...,P 4 ) satisfies the dispersion 
relation at some spacetime point rr, then any multiple tp — (tpi, . . . ,tp 4 ) 
also satisfies the dispersion relation at this spacetime point. Whenever this 
homogeneity property is satisfied the medium is called dispersion-free or non- 
dispersive] otherwise, it is called dispersive. In a non-dispersive medium, a ray 
is fixed by giving an initial event and an initial direction for the spatial wave 
covector (with respect to any normalized time-like vector field); in a dispersive 
medium, one has to give the length of the spatial covector in addition. This 
definition can be rephrased in terms of phase velocities and group velocities to 
yield the familiar physics textbook definition of dispersive and non-dispersive 
media, see Sect. 6.2 in Part II below. 

Hence, we have to ask ourselves what sort of modified ansatz for the 
medium could be able to cover the phenomenon of dispersion. This is an 
important question not only from a theoretical point of view but also in view 
of applications to astropyhsics. Dispersion plays a role for light propagation 
in planetary and stellar atmospheres and in interstellar plasma clouds. 

A closer look at the treatment of Chap. 2 shows that the following features 
are causative for the homogeneity of the eikonal equation. 

(a) Evolution equations and constraints give a system of linear differential 
equations for the electromagnetic field strength. 

(b) The limit a — > 0 is taken on a fixed background, i.e., neither evolution 
equations nor constraints involve the parameter a. 

As a matter of fact, it is easy to check that, whenever (a) and (b) are sat- 
isfied, the eikonal equation arises in the form H{x,dS{x)) = 0 where the 
Hamiltonian if is a homogeneous polynomial with respect to the momenta. 
(Afterwards, we are free to change the Hamiltonian according to transfor- 
mations of the form given in (2.84) for H = Ha- This leaves, of course, the 
homogeneity of the eikonal equation unchanged.) If we want to treat disper- 
sive media we have, thus, to modify the method of Chap. 2 by violating at 
least one of the two properties (a) and (b). 

The most obvious idea to violate property (a) is to modify the linear con- 
stitutive equations (2.12) by adding terms quadratic in the field strengths. 
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This is common practice in ordinary optics where it gives rise to interest- 
ing effects with relevance to strong electromagnetic wave fields. However, it 
is quite evident that such non-linear terms are not typically the origin of 
dispersion in crystals, gases, or plasmas. As a matter of fact, dispersion is 
frequently observed in situations where the field strengths are far too weak to 
make non-linear modifications of the constitutive equations necessary. More- 
over, there are several technical problems associated with the ray method if 
evolution equations and/or constraints are non-linear. Contrary to the linear 
case, our assumption that the approximate-plane-wave family (2.33) satisfies 
the differential equations asymptotically cannot be evaluated inductively in 
general. The reason is that even in lowest non-trivial order amplitudes 
with arbitrarily large N may show up, i.e., we do not get an eikonal equation 
for S alone. In other words, in a non-linear medium the propagation of wave 
surfaces in the high frequency limit and, thus, the corresponding propaga- 
tion of rays depends on the amplitudes of the wave fields. In this sense, there 
is no self-contained theory of ray optics for such media. To be sure, there 
are some non-linear equations that do give an eikonal equation for S alone. 
This is true, in particular, of semi-Zmear equations, i.e., of equations which 
are linear in the highest order derivatives of the dynamical variables (field 
strengths) with coefficients independent of these variables. However, for the 
inductive method of determining the amplitudes to carry over we need 
nothing less than linearity. For this reason, only in the linear case is it pos- 
sible to check, along the lines of Sect. 2.7, whether or not ray optics gives a 
viable approximation scheme. 

It is worthwile to mention another problem with non-linear equations. 
Suppose we know that some approximate-plane-wave family (2.33) stays close 
to exact solutions, for 0 < a < ao, within some given error bounds. Then it 
is still possible that a generalized Fourier integral (2.40), formed with this 
family over a real interval [ai, « 2 ] C [0, ao], deviates from all exact solutions 
by an arbitrarily large amount. In this sense, studying approximate-plane- 
wave families of the form (2.33) is of limited usefulness in a non-linear medium 
since it gives no information on non-monochromatic waves. 

These arguments show that it is somewhat problematic to apply the ray 
method to non-linear differential equations. As a matter of fact, the existing 
literature on this topic is much more “heuristic” than in the linear case. 
A typical reference is the book by Jeffrey and Kawahara [66] where many 
applications to physics are mentioned. Those applications refer mainly to 
fiuid mechanics where nonlinear effects are more important than in optics. In 
our context, the following strategy is advisable. When dealing with a medium 
for electromagnetic fields that gives non-linear evolution equations and/or 
non-hnear constraints, it is reasonable to linearize these equations around 
a (“background”) solution and to apply the ray method to the linearized 
equations. The resulting theory is valid for all wave fields which are sufficiently 
weak such that their self-interactions, caused by the non-linearities of the 
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full equations, can be ignored. Interactions with the background field are, of 
course, taken into account. 

Following this line of thought, the only possibility to treat dispersive me- 
dia is by violating the above-mentioned property (b). At first sight, the idea 
to smuggle the parameter a into the dijfferential equations seems alien to op- 
tics. (This is a major difference to the JWKB method in wave mechanics. In 
the latter case, the role of a is played by Planck’s constant h which, evidently, 
appears in Schrodinger’s equation.) Nonetheless, there is a sound method of 
achieving this goal. Strictly speaking, this method comes in various different 
variants. The common feature is that one considers asymptotic behavior of 
approximate-plane-wave families on a one-parameter family of background 
geometries, rather than on a fixed background geometry. Circumstances per- 
mitted, this gives an eikonal equation (and transport equations) in close anal- 
ogy to the treatment of Chap. 2. The crucial point is that even in the case of 
linear differential equations the eikonal equation need not be homogeneous 
with respect to dS^ i.e., dispersion is not excluded. The physical meaning of 
an eikonal equation derived that way depends, of course, on the way in which 
the background geometries depend on the parameter. In the following section 
we demonstrate the method by way of a special example. 



3.2 Light propagation in a non-magnet ized plasma 

In this section we consider a simple plasma model as a medium for electro- 
magnetic waves and we perform the passage to ray optics in such a way that 
dispersion is taken into account. Apart from some modifications, our treat- 
ment follows Breuer and Ehlers [18] [19]. For earlier references on the same 
subject we refer to Madore [90], to Bicak and Hadrava [14] and to Anile and 
Pantano [5] [6]. 

We restrict ourselves to the most simple plasma model, viz., to a two-fluid 
model with vanishing pressure. Then the dynamical system to be considered 
is governed by the equations 



9[aFbc] - 0 , (3.1) 

= J“ + enC/“ , (3.2) 

mU^VbU°' = eF%U^ , (3.3) 

Va(nC/“)=0, (3.4) 

gabU‘U'’ = -l. (3.6) 



(3.1) and (3.2) are the Maxwell equations for the electromagnetic field 
strength tensor Fab, where square brackets around indices mean antisym- 
metrization. In (3.2), the ionic current is denoted by J“, whereas the elec- 
tronic current is written as the product of electron charge e, electron particle 




3.2 Light propagation in a non-magnetized plasma 47 



density n, and electron 4-velocity U^. In mathematical terms, e is a nega- 
tive constant, n is a nonnegative scalar function, and is a vector field 
normalized by (3.5). 

(3.3) is the equation of motion for the electron fluid (Euler equation plus 
Lorentz force), where m is a positive constant with the meaning of the electron 
mass. Here we assume, as already mentioned, that the pressure of the electron 
fluid vanishes. This is a legitimate approximation as long as the plasma is 
sufficiently cold. 

(3.4) is the equation of charge conservation of the electron component. 
Please note that (3.2) already implies conservation of the total charge, 
Vo(J“ + ent/®) = 0, but not of the electron component alone. 

We want to view (3.1)-(3.5) as a system of non-linear first order differen- 
tial equations for Fab, n and U°' with the metric Qab and the ionic current 
assumed known. Viewed in this sense, (3.1)-(3.5) give us4+4-t-4-fl-|-l = 14 
component equations for 6-1-1 + 4 = 11 unknown functions. In a local coordi- 
nate system with time-like x^-lines and space-like hypersurfaces = const, 
our 14 equations split up into 11 evolution equations and 3 constraints. It is 
easy to verify that the evolution equations preserve the constraints. More- 
over, Breuer and Ehlers [18] were able to show that the system of evolution 
equations admits a locally well-posed initial value problem, and that the 
equations (3.1)-(3.5) are linearization stable. The latter property guarantees 
that solutions of the linearized equations are close to solutions of the full 
equations, i.e., that linearization gives a meaningful approximation. 

This is of particular relevance for us since, following the strategy out- 
lined in Sect. 3.1, we are now going to linearize (3.1)-(3.5) around some 
(“background”) solution. For simplicity we restrict ourselves to the case of 
a background solution with vanishing electromagnetic field. In other words, 
our background solution is given by a nonnegative scalar function n and a 

O 

vector field t/“ that satisfy the following set of equations. 



0 = J“ + en&“, (3.6) 

&^V6&“ = 0, (3.7) 

Va(nl7“)=0, (3.8) 

QabU^U^^-l. (3.9) 



Now we linearize the equations (3.1)-(3.5) around this background solution, 
i.e., we consider these equations for perturbed fields 

Fab = 0 + Fab , 

O ^ 

n = n + n , 

= &“ + !)■“ , 



(3.10) 

(3.11) 

(3.12) 
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and we drop all terms of second and higher order with respect to the perturba- 
tions The resulting equations govern the dynamics of sufficiently 

^ O 

weak electromagnetic waves Fab in our plasma which, according to Fab = 0, 
is assumed non-magnetized. We shall presuppose that the metric gab and the 
ionic current J“ are unperturbed. The first assumption is in agreement with 
our general stipulation to work on a fixed metric background, i.e., to disre- 
gard the back-reaction, governed by Einstein’s field equations, of matter and 
electromagnetic fields on the metric. The second assumption means that the 
effect of the electromagnetic wave on the ions is ignored. This is a reasonable 
approximation since the inertia of the ions is much bigger than that of the 
electrons. On these assumptions, the linearized system of equations for the 



perturbations takes the following form. 

d[aFbc] = 0 , (3.13) 

VbF'"^ = enU°' + ehU°' , (3.14) 

m VbU^^ + mU^ VbU°- = eP\ , (3.15) 

Va(n?7“ + n&“) = 0, (3.16) 

gabU^U^ = 0. (3.17) 



With gab, n and U°‘ known, (3.13)-(3T7) is a system of first order linear 
differential equations for Fab, fi and U°'. It is our goal to find dynamical 
equations for Fab alone, i.e., to eliminate n and U°'. This is indeed possible 
provided that the background density n has no zeros, 

n > 0 , (3.18) 



in the spacetime region considered. If this condition is satisfied, we can pro- 
ceed in the following way. 

From (3.14) we find, with the help of (3.9) and (3.17), 

en = -&aV^,F“^ (3.19) 

enU^ = VbF^\S^ + &c) ■ (3.20) 



Since we can divide by n, (3.20) can be used to eliminate O’® from (3.15). 
This results in the following linear second order differential equation for Fab‘ 



+ a- u^)VtVdF‘^ + (Vi + ir‘ ir,) + Vc&“) 



^nU’’F\ = 



0 . 



(3.21) 
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If we have a solution Fab of (3.13) and (3.21), we can define n and C/® by 
(3.19) and (3.20), respectively. It is easy to check that then the full system 
of equations (3.13)-(3.17) is satisfied. In other words, we have reduced this 
system (3.13)-(3.17) to dynamical equations for Fab alone, given by (3.13) 
and (3.21). 

To rewrite (3.13) and (3.21) in a more convenient form, we express Fab 
in terms of a potential Aa-, 

Fab ~ d[aAb] = V[a^6] , (3.22) 

and we assume that Aa satisfies the Landau gauge condition in the rest system 
of the background electron fiuid, 

AaU^ = 0. (3.23) 

It is a standard exercise in Maxwell theory to verify that any antisymmetric 
tensor field Fab that satisfies (3.13) can be locally represented in this way, and 
that Aa is (locally) uniquely determined by Fab up to gauge transformations 

Aa^Aa-hdah (3.24) 

o 

where h is any spacetime function that is constant along the flow lines of ?7“. 
In other words, h can be fireely prescribed on a hypersurface transverse to 
those flow lines. 

With (3.22), (3.13) is automatically satisfied and (3.21) takes the form 

(3.25) 

where the differential operator is defined by 

V^fAf = &c) Vfe(V^V" - gf^V^Wd)Af + 

(V6&‘((5“ + &c) + Vc&“) (V^V“ - /“V‘‘Vd) A/ - 

^ n (U^V“ - . (3.26) 

(3.25) determines the dynamics of electromagnetic waves in our plasma. 

(3.25) consists of four component equations, but only three of them are in- 
dependent since the equation 



UaV^fAf = 0 (3.27) 

is identically satisfied for any Af.By the Landau gauge condition (3.23), Af 
has three independent components. Hence, we have as many equations as 
unknown functions. In this sense, (3.25) gives a determined system of linear 
third order differential equations for the electromagnetic potential. To make 
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this explicit, one can choose, on an appropriate open subset of spacetime, an 

orthonormal tetrad field Ei, E2, Es, E4 with E% = U^. By (3.23), A/ is of the 
form Af = E^ with some scalar functions A^, A^^ A^ on that domain. 

Multiplication of (3.25) with Qac E^ gives us three equations (numbered by 
fi = 1,2,3) for the three functions A^, A^. It is shown in Breuer and 
Ehlers [18] [19] that this system of linear differential equations admits a local 

existence and uniqueness theorem for any data U^daAf^, U°’U° dadhA^^ 
prescribed on a space-like hypersurface. 

Viewed in this sense, (3.25) is the system of evolution equations for elec- 
tromagnetic waves in our plasma. Those evolution equations are of second 
order in the field strengths, and they are not supplemented by constraints. 
They are, thus, quite different from the evolution equations (2.23) in a linear 
dielectric and permeable medium. Unfortunately, (3.25) is not of the kind for 
which standard theorems guarantee the validity of energy inequalities. 

With the dynamical law (3.25) at hand, we can now perform the passage 
to ray optics. Since it is our goal to take dispersion into account, we proceed 
in a way different from Chap. 2. As outlined in Sect. 3.1, it will be crucial to 
consider one-parameter families of background fields rather than fixed back- 
ground fields. The background fields that enter into the differential operator 
are the metric Qab, tbe electron number density n and the electron 4- 

O 

velocity C/“. Let us fix such a set of background fields which have to satisfy 
(3.6)-(3.9) and (3.18). Further, let us fix a spacetime point and a coordinate 
system around this point. We assume that the chosen point is represented 
by the coordinates xo = {xq,Xq,Xq,Xq) and that the considered coordinate 
domain is star-shaped with respect to rco in E'^. The latter condition means 
that for any point x in this domain the straight line between x and xq is com- 
pletely contained in this domain. Refering to this fixed coordinate system, 
we define new background fields, depending on a real parameter /3, by 

9ab {/3, x) = gab {xq + P{x - a:o)) , (3.28) 

n(/3, x) = n(xo + /0(x - xq)) , (3.29) 

E^(j3,x) = U“(xo + P{x - xo)) . (3.30) 

Q O 

For 0 < /5 < 1, the new background fields gab{P, * )» ^(/^> ' )> • ) are 

well defined on the star-shaped domain considered, and they satisfy again 
equations (3.6)-(3.9) and condition (3.18). (This observation does not carry 

O 

over if an electromagnetic background field Fab ^ 0 is to be taken into 
account. For a magnetized plasma, one cannot assume the same /^-dependence 

o ° ° 

for all background fields gab, n, U®, and Fab-) 

For /3 — ^ 0, the components of the background fields become constant in 
the coordinate system under consideration. In this sense, gab{^, •)> ') 
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O 

and U°'{0, •) are homogeneous fields. In particular, Pob(0, •) is a flat metric 

O 

and C/“(0, •) is covariantly constant, i.e., an inertial system, with respect 
to this metric. For this reason, we shall refer to the limit > 0 as to the 
homogeneous background limit 

If we replace in (3.2) the original background fields gab, n and by 

9ab{P, •), ■) and •)> respectively, we get a one-parameter family 

of differential operators (P, • ). It is our plan to enter into the differential 
equation V°'f (^0, -)Af{P, • ) = 0 with an approximate-plane-wave ansatz for 
the potential Af{P, • ). Hence, we consider two-parameter families of the form 

Af(a,P,x)= (3.31) 

^ a,f(a, xq 4- P(x — aJo))| 

which satisfy the Landau gauge condition 

U^(P, x) Af(a, P,x)=0. (3.32) 

We assume that the complex amplitudes are of the form 

No+l 

XI «/ ( • ) «^ + 0(a^°+^) (3.33) 

JV=0 

for all integers iVo > -1 and that 

Fab{oc, p, x) = d[aAb] (a, /?, x) = (3.34) 

i (5[„5 d°j) (a;o + P{x - a;o)) + 0(a) | 

is an approximate-plane- wave family, in the sense of Sect. 2.2, for any fixed P 
with 0 < < 1. For an approximate plane wave in this family, the frequency 

function with respect to the background electron rest system (3.30) is then 
given by 



uj{a,p,x)^ (3.35) 

I U‘^{xo + p{x - xo)) daS(xo + p{x - xo)) . 

To perform the passage to ray optics, we have to assume that our approximate- 
plane- wave family satisfies the dynamical equations asymptotically. Since we 
have two parameters a and P at our disposal, we can consider asymptotic 
behavior with respect to different kinds of limits. 

The first possibility is to keep P fixed and to consider the condition 



(3.36) 
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for AT € Z. This is essentially the same kind of limit as considered in Chap. 2. 
It can be characterized as the high frequency limit on a fixed background. In 
the case at hand, the lowest non-trivial order is iV = —3. We leave it to the 
reader to compute from (3.36) with iV = — 3 that the resulting eikonal equar 
tion equals the vacuum eikonal equation in the background metric gabiP^ • )» 
i.e., that the corresponding rays are exactly the light-like geodesics of this 
background metric. In other words, if the high-frequency limit is taken on 
a fixed background, the plasma has no influence on the rays. In particular, 
there is no dispersion. (If this kind of limit is to be considered, one can, of 
course, stick to the case P = 1 throughout, i.e., there is no need to introduce 
the parameter P at all.) 

Now we want to consider a different kind of limit, namely to let P and a 
■ go to zero simultaneously with the quotient ^ kept fixed. We can then simply 
put a = P and consider the condition 

i^o <^» • )) = 0 (3-37) 

for N eZ. Keeping ^ fixed implies that the frequency function (3.35) is kept 
fixed at the point xq. Therefore, this kind of limit can be characterized as 
the homogeneous background limit with fixed frequency at xq- We shall now 
prove that this limit gives, indeed, a different eikonal equation. To that end, 
we have to assume that (3.37) holds in lowest non-trivial order which is now 
given by iV = 0. This is true if and only if the equation 

Q/a? = 0 (3.38) 

holds at Xq, where Qj is an abbreviation for 

Qj = (3.39) 

SiS ( - dcS dfS + 5> e^S d^s si 

Here we have used the equation 

U^{xq) a,f(xo) = 0 (3.40) 

which follows from the Landau gauge condition (3.32). Since (3.34) is sup- 
posed to be an approximate-plane-wave family, must be non-zero and 
linearly independent of d/S. The condition that (3.38) admits a solution a° 
of this kind at Xo gives the desired eikonal equation at xq for S. We have, thus, 
to solve the eigenvalue problem of Q/ restricted to the orthocomplement of 

O 

l/J. We find that there are three real eigenvalues, viz. 

Ai = S6S( - (&“ dcSf + ^ n) , 

Aj = As = di,s(^d^Sd^S + ^ n) . 



(3.41) 

(3.42) 
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If either U^dbS = 0 or daS = ± y ^ n Ua, all three eigenvalues coincide 
and (3.38) is satisfied by any a®. Otherwise, we find Ai ^ A2 = A3. In the 
latter case, the eigenspace pertaining to Ai is one-dimensional and spanned 

by dfS -{■ IJfU^ dbS whereas the eigenspace pertaining to A2 = A3 is two- 

O 

dimensional and consists of all Xf with Xf = SXf = 0. 

Equation (3.38) admits a non-trivial solution a® which is perpendicular 

O 

to if and only if one of the eigenvalues Ai, A2, A3 is zero. FVom the form 
of the eigenspaces we see that in any such case can be chosen linearly 
independent of dfS. Hence, the eikonal equation takes the form Ai A2 A3 = 0 
which is equivalent to 

U’>di,s[-(U'=aeSf + ^n)(d^Sd^S + ^n) =0. (3.43) 

Let us be precise about this result. Our assumption that the asymptotic 
condition (3.37) holds in lowest non-trivial order requires that S satisfies 
(3.43) at the point xq around which the construction was done. 

Although we have used a fixed coordinate system around the chosen space- 
time point to perform the homogeneous background limit, the eikonal equa- 
tion is a covariant equation (i.e., independent of this coordinate system). If 
S satisfies this covariant equation (3.43) on an open spacetime domain U, 
it is associated with an as3onptotic solution of lowest non-trivial order, in 
the homogeneous-background sense, around any point of U. That is to say, 
to any such S we can find a non-trivial amplitude d/(a, • ) on U such that 
the following holds. If we choose any coordinate system around any point of 
U, thereby defining the one-parameter family of operators • ) and the 

two-parameter family (3.31) of electromagnetic fields, the asymptotic condi- 
tion (3.37) is satisfied for iV = 0. As a matter of fact, a similar statement is 
true for any N. However, this more general result does not follow from our 
reasoning so far. 

Owing to the terms proportional to n, the eikonal equation (3.43) is not 
homogeneous with respect to dS. This indicates dispersion. 

The product structure of the eikonal equation (3.43) suggests to introduce 
three partial Hamiltonians 



Hi{x,p) = U^{x)pb , (3.44) 

H2{x,p) = I ( - U^{x)paPb + ^ n(x)) , (3.45) 

Hzix.p) = \ [g°'^{x)paPb 4- ^ n(x)) . (3.46) 

Our assumptions guarantee that each partial Hamiltonian satisfies condition 
(2.83) on its characteristic variety. We are, of course, free to change each 
partial Hamiltonian by a transformation of the form (2.84). 
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The three partial Hamiltonians determine three branches of the dispersion 
relation. The branches defined by H 2 and Hs have an intersection given by 

the equation Pa = ± \/ ^ At all points of phase space where this 

equation does not hold, at most one of the three partial dispersion relations 
can be satisfied. (This is true as long as our assumption (3.18) is valid.) 

In analogy to Chap. 2 we assign to each solution S of the partial eikonal 
equation 



Ha {x, dS{x)) = 0 , A = 1 , 2 , 3, (3.47) 

a (partial) transport vector field A"® defined by 

K‘{x) = ^(x,dS{x)) . (3.48) 

^Pa 

The integral curves of iiT® are, again, called the {A-)rays associated with S. 
The totality of all A-rays, associated with any solution of (3.47), is found by 
solving Hamilton’s equations ( 2 . 6 ) for A = 1, 2 , 3, respectively. 

It is worthwile to mention that this definition associates a unique con- 
gruence of rays to each solution S of the full eikonal equation (3.43). This 
can be verified in the following way. In almost all cases, a solution of the 
full eikonal equation satisfies exactly one of the three partial eikonal equar 
tions (3.47). The only exception occurs if, at some point x, the equation 

daS{x) = ± ^ n{x) Ua(x) holds such that (3.47) is satisfied for A = 2 

and for A = 3 simultaneously. At such points we have two partial transport 
vectors, given by (3.48) with A = 2 and with A = 3, respectively. Luck- 
ily enough, we find from (3.45) and (3.46) that these two partial transport 
vectors coincide. 

Let us consider the three partial Hamiltonians one by one. Solutions of 
the partial eikonal equation (3.47) with A = 1 are pathological insofar as they 
have vanishing frequency in the background rest system of the electron fluid, 

O o 

U^{x)daS{x) = 0. Hence, t/® is not an “admissible reference system” for 
the approximate-plane-wave interpretation. The transport vector field (3.48) 
associated with such a solution is given by 

K^{x) = L-“(x) , (3.49) 

O 

i.e, the rays are the integral curves of L”® . Note that Hi ( • , 05( • )) =0 implies 
that the eigenvalues (3.41) and (3.42) coincide, Ai = A 2 = A 3 = 0, and that 
equation (3.38) is identically satisfied for all aj. In other words, the amplitude 

= ia^^d^S is not restricted by any polarization condition. 

For a solution of the second partial eikonal equation H 2 {x,dS(x)) = 0 
the frequency function with respect to the background rest system of the 
electron fluid is determined by the equation 




3.2 Light propagation in a non-magnetized plasma 55 



U^{x) daS{x) = ±ujp{x) , (3.50) 

where ujp denotes the plasma frequency defined by 

^ n(x) . (3.51) 

For the transport vector field (3.48) associated with such a solution 5 we find 

K^^lx) = ±cjp(x) U^(x) (3.52) 

O 

such that the rays coincide, again, with the integral curves of 17“ . (Please 

O 

recall that the parametrization of rays is arbitrary.) The case daS = ±ujpU°‘ 
plays a special role since in this case S satisfies the partial eikonal equa- 
tion (3.47) not only for A = 2 but also for A = 3. For this special solution 
we have again Ai = Aq = A 3 and, thus, no polarization condition of 0 *^ 
order. For all other solutions of H2{x,dS{x)) — 0, (3.38) requires that d® 
is in the eigenspace pertaining to the eigenvalue Ai given by (3.41), i.e,, 

O O 

that dj is a multiple of dfS + UfU^dcS. This condition implies that the 

^ O 

electric component of with respect to is a linear com- 

O 

bination of Ua and daS and that the corresponding magnetic component 
vanishes. This is tantamount to a longitudinal polarization condition in the 
sense that the electric field strength is parallel to the spatial wave covector, 

^ O 0 0 

i.e., fab^^ ~ u(daS + U^dbSUa) with some real-valued function u. Those 
longitudinal modes described by the partial Hamiltonian H2 are known as 
plasma oscillations. 

Now let us turn to the third partial Hamiltonian H3. For A = 3, formula 
(3.48) yields the same expression for the transport vector field as in vacuum, 
viz. 



K^{x)=g^\x)dbS{x). (3.53) 

Using our assumption that n has no zeros, we find that the 3-rays (i.e., the 
rays determined by the partial Hamiltonian H 3 ) are exactly the time-like 
geodesics of the metric Up Qab which is conformally equivalent to gab- The 
easiest way to verify this result is by changing Hs according to 

H3{x,p) = i (^g°'^{x)paPb + i^p{x)^) I — ^ (3.54) 

^3(x,p) = H3(x,p) = i g‘^{x)paPh + 1 ) ■ 

Since this transformation is of the form_(2.84), it leaves the rays unchanged up 
to reparametrization, i.e., we can use H 3 instead of H 3 for the determination 




56 



3. Light propagation in other kinds of media 



of the 3-rays. Solving Hamilton’s equations (2.6) with this transformed Hamil- 
tonian gives, of course, the time-like geodesics of the conformally rescaled 
metric gab = Qab parametrized by ^o 6 -proper time. 

To further analyze the third partial Hamiltonian we consider a solution S 

O 

of iif 3 ( • , dS{ • )) = 0 but exclude the case daS = ±u)pU°' which was already 
considered above. Then (3.38) requires that d® is in the eigenspace pertaining 
to the eigenvalue A 2 = A 3 given by (3.42), i.e., that d° satisfies the condition 

= 0 (3.55) 

in addition to the Landau gauge condition = 0. This is tantamount to a 
transverse polarization condition for the 0 *^ order amplitude db]S 

in the following sense. (3.55) and the Landau gauge condition imply that the 

A O 

electric and magnetic components of j, with respect to are perpendicular 

to the gradient of S, i.e., that — 0 and Ubfad^aS = 0. By 

(3.53), this implies that the electric and magnetic components of the 0*^ order 
amplitude are perpendicular to the rays. 

Prom this analysis of the three partial Hamiltonians we see that, for trans- 
verse modes with non-zero frequency, the eikonal equation reduces to the form 

d^S{x) daS{x) 4 n{x) = 0 . (3.56) 

TTl 

i.e., to the partial eikonal equation determined by Hz- On a flat spacetime, 
the eikonal equation (3.56) is discussed in any textbook on plasma physics, 
see, e.g., Stix [134]. On a curved spacetime, it was derived, with increasing 
rigor, by Madore [90], Bicak and Hadrava [14], Anile and Pantano [5] [ 6 ] and 
Breuer and Ehlers [18] [19]. 

If we consider the limit n — ^ 0 we reobtain the familiar eikonal equation 
for light propagation in vacuum from (3.56). (Note that this is not the case 
for the partial eikonal equation (3.47) with A = 1 or A = 2.) It is, thus, 
admissible to consider (3.56) for any spacetime function n which is non- 
negative (but not necessarily strictly positive). Spacetime regions on which 
n > 0 are to be interpreted as occupied by a plasma whereas spacetime regions 
on which n = 0 are to be interpreted as vacuum. To stick with our general 
stipulations, we assume that n is a C°° function everywhere. We can then 
find (C°°) solutions S of (3.56) which give us wave surfaces traveling partly 
through vacuum and partly through plasma clouds. An analogous treatment 
based on H\ or H 2 rather than on Hz is impossible. This indicates that Hi 
and H 2 have nothing to do with electromagnetic waves passing through our 
plasma. (A full discussion of this topic requires replacing our condition 
with a piecewise C°° condition and deriving junction conditions for daS and 
for the amplitudes aj from the asymptotic condition (3.37). We shall not 
embark upon such an investigation here.) 
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It is, therefore, justified to concentrate the discussion of light propagation 
in our plasma on the partial Hamiltonian Hs. Using the eikonal equation 
(3.56) and the 0*^ order polarization condition (3.55) as the starting point, 
we could now go on to evaluate (3.37) inductively for N — 1,2,3 etc. This 
would result in transport equations and polarization conditions for the am- 
plitudes of arbitrarily high order. We leave it to the reader to verify that, 
proceeding along the lines of Sect. 2.4, this hierarchy of equations can be 
solved inductively to construct solutions of the asymptotic condition (3.37) 
for arbitrarily large N. Unfortunately, it is a difficult problem, apparently 
unsolved so far, to prove that those asymptotic solutions are approximate 
solutions as well. The method of Sect. 2.7 does not carry over since the dif- 
ferential equation (3.25) is not of the kind for which energy inequalities are 
known to hold true. Therefore, it is hard to see how the difference between 
our asymptotic solutions and appropriate exact solutions could be estimated. 
If such error estimates do exist, they are, of course, quite different depending 
on which sort of limit is considered. For the homogeneous background limit 
with fixed frequency considered here, the error bounds must go to zero if 

the background fields gab, n and U°‘ become homogeneous. In other words, 
our asymptotic solutions yield good approximations if the background fields 
are sufficiently homogeneous. Clearly, this is not necessarily the case if the 
high-frequency limit on a fixed background is considered. Either limit yields 
a reasonable eikonal equation, reasonable transport equations and reason- 
able polarization conditions. The difference is in the range of validity as an 
approximation scheme for exact electromagnetic wave fields (providing this 
validity can be established, in terms of error estimates, at all). 

From the results of this section we can draw the following lesson. The 
eikonal equation for light propagation in a plasma does not only depend on 
the plasma model (two-fluid model, infinite inertia of the ion component, van- 
ishing pressure of the electron component, linearization around background 
with vanishing electromagnetic field, etc.); it also depends on the kind of 
asymptotic limit considered. For the high frequency limit on a fixed back- 
ground, the eikonal equation is exactly the same as for light propagation in 
vacuum, i.e., there is no effect of the plasma on the rays. For the homogeneous 
background limit with fixed frequency, on the other hand, the eikonal equa- 
tion is given by (3.43), i.e., there is an effect of the plasma on the rays which 
causes, in particular, dispersion. Although the eikonal equation (3.43) has 
a product structure associated with three partial Hamiltonians, one should 
not speak of “multiple refraction” in this case. The reason is that only the 
transverse modes described by Hz can be linked to solutions of the vacuum 
eikonal equation in the way indicated above. In other words, rays that enter 
into our plasma from an adjacent vacuum region have to proceed as 3-rays, 
i.e., they are not multiply refracted. In a magnetized plasma, however, the 
background electromagnetic field causes the branch of the dispersion relation 
associated with Hz to split into two branches. Then the medium becomes 
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double-refractive. On a special-relativistic background this is a standard re- 
sult of plasma physics. For a general-relativistic treatment of this case we 
refer to Breuer and Bhlers [18] [19]. 

In addition to the high frequency limit on a fixed background and the 
homogeneous background limit with fixed frequency there are many other 
possibilities. To mention just one further example, we could modify ansatz 
(3.28)-(3.30) by assuming a different scaling behavior for the one-parameter 
family of background fields. In this way it is possible to derive, e.g., an eikonal 
equation such that the rays are directly affected by the rotation of the back- 

O 

ground rest system U°'. An eikonal equation of this kind was brought forward 
by Heintzmann and Schriifer [61], based on earlier work by Heintzmann, 
Kundt and Lasota [60] in the context of special relativity. For each eikonal 
equation derived that way, the range of validity as an approximation scheme 
(if any) must be checked individually. 




4. Introduction to Part II 



In Part II we treat ray optics as a theory in its own right. In Chap. 5 we 
presuppose an arbitrary finite-dimensional manifold M and set up a Hamil- 
tonian formalism for ray optics in the cotangent bundle over M. In Chap. 6 
we assume that, in addition, a Lorentzian metric is given on M. Specialized 
to the case dim(jM) = 4, (M,g) can then be interpreted as a spacetime in 
the sense of general relativity and our formalism covers ray optics in arbi- 
trary media on such a spacetime. This procedure has the advantage that the 
results of Chap, 5 apply equally well to spacetime theories other than general 
relativity and to the case that M is to be interpreted as space, rather than as 
spacetime, in any kind of theory where such a notion makes sense. Chapter 7 
will then be devoted to variational principles for rays and Chap. 8 presents 
applications of the general formalism to astrophysics and astronomy. 

The results of Part I will often be used for the sake of motivation, and 
they will provide us with illustrative examples. However, the mathematical 
formalism developed in Part II is completely self-contained. 



4.1 A brief guide to the literature 

In the following we make extensive use of Hamiltonian formalism. For the 
most part we use coordinate free notation since it is our goal to also treat 
some global questions. We assume that the reader is familiar with differential 
calculus and with symplectic geometry as it is used in the modern treatment 
of classical mechanics. Our standard reference for background material is the 
textbook by Abraham and Marsden [1], In addition, we also refer to Arnold 
[8] and to Woodhouse [150]. More particularly in view of optics, it might 
be helpful to consult the textbook by Guillemin and Sternberg [55] where 
applications of the Hamiltonian formalism and of symplectic geometry to 
optics are given in modern mathematical terminology. In traditional notation, 
applications of the Hamiltonian formalism to optics can be found, e.g., in 
the classical work of Caratheodory [25] and of Luneburg [88] [89]. Readers 
interested in the historical roots of “Hamiltonian optics” should go back 
to the original work of Sir William Rowan Hamilton who established this 
formalism in the 1820s, see vol. 1 of the collected papers of Hamilton edited by 
Conway and Synge [29]. Next to the work of Hamilton, the most fundamental 
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contribution to the mathematical theory of ray optics is due to Bruns [23] 
who introduced the socalled eikonal function. The relation of Bruns’s eikonal 
function to Hamilton’s characteristic function is controversially discussed in 
articles by Herzberger [63] and Synge [139]. 

Textbooks on general relativity do not usually treat ray optics in detail. 
Most of them are restricted to light propagation in vacuo where the light rays 
are just the light-like geodesics of the spacetime metric. An important excep- 
tion to this rule is the book by Synge [142] where a Hamiltonian formalism 
for ray optics in isotropic media is discussed in some detail. It is worthwile 
to compare this to earlier work on ray optics by the same author, see Synge 
[138] [140] [141]. More recent work by Miron and Kawaguchi [97], also see 
Kawaguchi and Miron [68] [69], is strongly influenced by ideas of Synge but 
uses a more modern mathematical terminology. It is the main purpose of 
their work to develop a differential geometric formalism for light propagation 
in isotropic dispersive media. Miron and Kawaguchi repeatedly claim that 
standard symplectic geometry does not provide an appropriate framework 
for the treatment of such media. We do not share this point of view . 

Having set up a Hamiltonian formalism for general-relativistic ray optics, 
the way is paved for characterizing rays by a variational principle. Some of 
these variational principles can be interpreted as general-relativistic versions 
of Fermat principle. The oldest versions, which hold for vacuum rays in 
static or stationary spacetimes, date back to Weyl [149] and Levi-Civita [81] . 
Related material can be found in Levi-Civita [80] [82] [83] and Synge [137]. 
(The reader is cautioned that the latter paper does not meet the standard of 
Synge’s later work.) These versions of Fermat’s principle are also discussed in 
several modern textbooks and review articles, see, e.g., Prankel [43] or Strau- 
mann [136] for the static case and Landau and Lifshitz [76] or Brill [21] for 
the stationary case. For a discussion from a mathematical point of view we 
refer to Masiello [93] . Generalizations from vacuum to an isotropic medium, 
but still assuming stationarity, were first considered by Pham Mau Quan 
[116] [117] [118] [119]. On the other hand, Uhlenbeck [144] found the first 
variational principle for (vacuum) light rays in general-relativistic spacetimes 
without symmetries. Whereas for the work of Uhlenbeck it was crucial that 
the spacetime be globally hyperbolic, Kovner [74] was able to formulate a 
Fermat principle for vacuum light rays in an arbitrary Lorentzian manifold. 
A rigorous proof that the solution curves of Kovner’s variational principle 
are, indeed, the light-like geodesics was given in Perlick [108] . Kovner’s vari- 
ational principle was further discussed, both from a physical and from a 
mathematical point of view, e.g., by Faraoni [42], Nityananda and Samuel 
[101], Schneider, Ehlers and Falco [128], Bel and Martin [12], Perlick [110] 
and in several articles by Giannoni, Masiello and Piccione, see, e.g., Giannoni 
and Masiello [46] or Giannoni, Masiello and Piccione [47] [48]. 
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4.2 Assumptions and notations 

Throughout Part II we presuppose a finite-dimensional real C7°° manifold M. 
whose topology satisfies Hausdorff ’s axiom and the second axiom of countabil- 
ity. This implies that M. is paracompact. The terms “manifold” and “sub- 
manifold” always mean manifold and submanifold without boundary. The 
physical interpretation we have in mind refers to A4 as to a spacetime model 
in the sense of general relativity. However, for the basic concepts of ray optics, 
to be introduced and discussed in Chap. 5, we need no additional structure 
on M. and n = dim(A4) need not be specified. In Chap. 6 we shall assume 
that there is a Lorentzian metric g given on A4, whereas n = dim(Ad) 
will still be an unspecified positive integer except for the restriction that, in 
Chap. 6, we assume n > 2 to exclude some pathologies. In all applications to 
relativity we use units making the vacuum velocity c of light equal to one. 

At a point q e A4, we denote the tangent space by TqM. and its dual, 
the cotangent space, by T*A4. The tangent bundle will be denoted by 
tm : TM. — > M. and the cotangent bundle by : T*M — >• M.. It will 
often be necessary to remove the zero section from TM. and from T*M.\ 

O O 

what is left will be denoted by TA4 and T*M, respectively. 

By a “Lorentzian metric” we always mean a covariant symmetric second 
rank tensor field with signature (+, With respect to a Lorentzian 
metric g, a linear subspace of the tangent space TqM is called space-like if 
on this subspace the metric g is positive definite, light-like if it is positive 
semidefinite but not positive definite, and time-like otherwise. A vector X € 
TqM is called space-like, light-like, or time-like whenever the linear subspace 
spanned by this vector has the respective property. As a consequence, X is 
space-like if X = 0 or g{X, X) > 0, light-like if X ^0 and g{X, X) = 0, and 
time-like if g{X,X) < O.lf X is space-like, light-like, or time-like, the same 
property is assigned to the covector g{X, • ). Finally, a submanifold of M is 
called space-like, light-like, or time-like whenever its tangent space has the 
respective property at all points. 

With a Lorentzian metric (or, more generally, a pseudo-Riemannian met- 
ric of any signature) on M there is associated its Levi-Civita connection V. 
This defines the notions of parallel transport and of geodesics. By a geodesic 
we always mean a map A : I — > M from a real interval into M such that 
Vj^A is parallel to A. Such a curve is called an affinely parametrized geodesic 
if, more specifically, the equation Vj^A = 0 is satisfied. 

We assume that the reader is familiar with exterior calculus. As to the 
definition of antisymmetric tensor product, exterior derivative, etc., our sign 
and factor conventions follow Abraham and Marsden [1]. Whenever refering 
to a local chart (a;^ , . . . , o:“) on A4, we use Einstein’s summation convention 
with latin indices running from 1 to n and greek indices running from 1 to 
n — 1. With respect to such a local chart, elements of TM can be represented 
in the form v°'d/dx‘^, and elements of T*M can be represented in the form 
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Padx^. In this way we get a local chart . . . , . . . , on TM, and a 

local chart (x^, . . . , x“,pi, . . . ,pa) on T*M. Following Abraham and Marsden 
[1] we refer to charts constructed in this way as natural charts induced by 
(x^, . . . ,x*^). It is also usual to refer to the as to velocity coordinates and 
to the Pa as to momentum coordinates conjugate to the x®. Occasionally we 
also refer to elements of TM. as to velocity vectors and to elements of T*M 
as to momentum covectors. 

It is well known and easily verified that there is a (unique and globally 
well-defined) one-form 6 on T*M such that 9 = padx^ in any natural chart. 
9 is known as the canonical one-form on T*M. It can be characterized in a 
coordinate-free manner as the unique one-form on T*M such that P*9 = f3 
for all C°° sections 0: M — T*M, see Abraham and Marsden [1], p. 179. 
(Here, 0*9 denotes the pull-back of 9 with 0.) The two-form Q = —d9, which 
is known as the canonical two- form on T* M, takes the form Q = dx® Adpo in 
any natural chart. More generally, any chart on T*M in which 17 takes this 
special form is called a canonical chart. It is obvious that 17 is closed (i.e., 
dl7 = 0) and non-degenerate (i.e., the equation 17 (A, • ) = 0 implies X = 0). 
Hence, 17 makes T*M into a symplectic manifold. The restrictions of 9 and 

O 

Q to T*M will again be denoted by 9 and 17 for the sake of simplicity. 

17 can be used to assign to each function H: T*M — M a Hamil- 
tonian vector field Xh on T*M by the formula 



n{XH, -) = dH. 



(4.1) 



The non-degeneracy of 17 guarantees that the assignment H i — > Xh is, 
indeed, well defined. In a natural chart, the Hamiltonian vector field Xh 
takes the form 



dH d dH d 
dpa dx°- 0X® dpa 



(4.2) 



A C°° curve ^ ; I — > T*M, defined on a real interval J, is called a solution 
of Hamilton’s equations iff it is an integral curve of the Hamiltonian vector 
field Xh, i.e., iff 



%.)(«(*). •) = (<i^f)5W (4-3) 

for all s € J. If ^ is represented in a natural chart as a map s i — > (x(s),p(s)) , 
(4.3) takes the familiar canonical form of Hamilton’s equations 

ic“(s) = (x(s),p(s)) , (4.4) 

Pa(s) = ^ P(s)) • (4.5) 



Equation (4.4), which gives the velocity coordinates as functions of the po- 
sition and momentum coordinates, is properly called the vertical part of 
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Hamilton’s equations since it corresponds to equation (4.3) applied to vec- 
tors tangent to the fibers of T*M. If (4.4) can be solved for the momentum 
coordinates Pa, the result can be inserted into (4.5). This leaves us with a 
set of second order equations for the position coordinates re®. Locally around 
some point u e T*M, (4.4) can be solved for the momentum coordinates Pa 
if and only if the condition 



det{H°'^) ^ 0 



(4.6) 



holds at u. Here and in the following, we use the abbreviations 



dpa ’ dpadpi, ’ 



etc. 



(4.7) 



A Hamiltonian H that satisfies (4.6) at u is called regular or non-degenerate 
at u. (It is easy to check that (4.6) holds in any natural chart if it holds in 
just one natural chart.) Hence, locally Hamilton’s equations can be viewed 
as a system of second order differential equations on M if and only if H is 
everywhere regular. 

To express regularity in invariant notation, without refering to a natural 
chart, one introduces the fiber derivative of a function H : T* /A — ^ R in 
the following way. For q e M, we denote the restriction of H to T*M by 
Hq] then, for each u € T*M, the differential {dHq)u: T^M — > R being a 
linear map can be viewed as an element of (T*M)* =TqM. The fiber deriva- 
tive WH: T*M — ^ TM of H is defined by the equation {WH){u) = {dHq)u 
where q = Using natural charts on T*M and TM, induced by one 

and the same chart a; = (rr^ . . . , a;®) on M, the fiber derivative takes the form 
^x,p) I — ^ (x,v = dH/dp). With the help of the map FF the vertical part of 
Hamilton’s equations (4.3) can be written in the form (r)^ ° 0 ~ ° 

where the ring denotes composition of maps and the dot stands for derivative 
with respect to the curve parameter. Hence the desired invariant characteriza- 
tion of regularity can be given in the following way. A Hamiltonian is regular 
at a point u&T*M if and only if its fiber derivative FJT maps a neighborhood 
of u diffeomorphically onto an open subset of TM. If FH: T*M > TM is 
even a global diffeomorphism, H is called hyperregular. 




5. Ray-optical structures 
on arbitrary manifolds 



In Part I we have seen that the fundamental object on which all of ray 
optics can be based is a dispersion relation, i.e., a characteristic variety in 
a cotangent bundle. We make this into the general definition of ray optical 
structures on our arbitrary n-dimensional manifold. 



5.1 Definition and basic properties 
of ray-optical structures 

When we performed the passage from Maxwell’s equations to ray optics in 
Part I, the characteristic variety came about as the zero-level surface of a 
Hamiltonian function H, where H was a smooth function on the punctured 
cotangent bundle whose derivative with respect to the momentum coordi- 
nates had no zeros. It is therefore natural to define a ray-optical structure 
on our arbitrary n-dimensional manifold as a closed codimension-one sub- 

O 

manifold Af of T*M which is everywhere transverse to the fibers. If we add 
the condition that Af covers all points of Al, we are led to the following 
definition which is fundamental for all of Part II. 

Deflinition 5.1.1. A ray-optical structure on AA. is a (2n — l)-dimensional 

O 

closed embedded C°° submanifold Af ofT*A4 such that Af — > A4 is 

a surjective submersion. 

Here |jV denotes the restriction of the bundle projection : T* Ad — 

Ad to Af. The condition of ]// being a submersion guarantees that Af is 

O 

transverse to the fibers oiT*Ad, whereas the condition of being surjec- 
tive guarantees that Af covers all of Ad. Both conditions together imply that 

O 

the set Afq = Af C[T*Ad is a codimension-one submanifold of the punctured 
cotangent space T*Ad for all g € A^. As AA is closed in T*Ad, so is Afq in 

O 

T*A4. Note that, for two different points q and q', the manifolds Afq and Afq/ 
are not necessarily diffeomorphic. In particular, Af need not be a fiber bundle 
over Ad. This will be exemplified below. 
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According to Definition 5.1.1, U need not be closed in T*M and its 
closure in T*M might fail to be a smooth manifold at the zero section. 
In view of the examples we have in mind, it is indeed necessary to keep 
Definition 5.1.1 as general as that. In particular, we do not want to exclude 
the case that M is the null cone bundle of a Lorentzian metric. 

If Ml and M2 are two ray-optical structures on M with Mi C\ M2 = 0 , 
then M — Mi\J M2 IS again a ray-optical structure on M. Conversely, each 
connected component of a ray-optical structure is a ray-optical structure in 
its own right. 

Ray-optical structures that come about as level surfaces of Hamiltonian 
functions are characterized by the following proposition. 

Proposition 5.1.1. Fix a C°° function H: T*M — M and let M be the 

O 

zero-level surface of H, i.e., M = {u e T*M \H(u) =0}. ThenM is a ray- 
optical structure on Mi, provided that H satisfies the following two properties. 

O 

(a) For all q e Ml, the set Mq = {u e T*Mt I ff(u) = 0 } is non-empty. 

(b) For all u € M, the fiber derivative of H satisfies {¥H){u) ^ 0. 

Proof. Condition (a) guarantees that the map |;v/- is surjective. Condition 
(b), which is the coordinate-free way of saying that the derivative of H with 
respect to the momentum coordinates has no zeros, guarantees that M is a. 

O 

closed embedded codimension-one submanifold of T*Mt and that is a 

submersion. □ 




Fig. 5.1. For the ray-optical structure of Example 5.1.1, Mq is the null cone of the 
metric go at each point q £ Mi. 




5.1 Definition and basic properties of ray-optical structures 



69 



A ray-optical structure need not be generated globally by a function H in 
this way. In general, such a function H exists only locally. (Later in this sec- 
tion we shall investigate this question in more detail.) Insofar, Definition 5.1.1 
applies to situations more general than those encountered in Part I. Such a 
generalization seems to be reasonable since the treatment of Part I was local 
throughout. 

Next we mention some examples of ray-optical structures. They will serve 
as our standard examples for the discussion of all properties of ray-optical 
structures. Therefore the reader is requested to commit them to his or her 
memory. 

Example 5.1.1. Let Qo be a C°° Lorentzian metric on M. and denote the 
induced fiber metric on T*M by gf. (In other words, if the components 
of Qo are denoted by {go)ab^ the components of gf are given by gf^ with 

= if.) Define .R by H(u) = \gf(u,u) 

H{x,p) = ^gf{x)paPb (5.1) 

o 

in terms of natural coordinates. Then J'/ = {u E T*M \ H{u) — 0} is a 
ray-optical structure on M. 

For the case dim(Ad) = 4 this example admits several different interpre- 
tations on the basis of general relativity. The first possibility is to interpret go 
as the spacetime metric such that M gives light propagation in vacuum. The 
second possibility is to interpret go as the optical metric (2.88) in a linear 
dielectric and permeable medium which is isotropic. The third possibility is 
to interpret go as one of the two optical metrics (2.91) in a linear dielectric 
and permeable medium which is anisotropic with the special features typ- 
ical of a uniaxial crystal. In the latter case it is important to realize that, 
in general, the two optical metrics must be treated separately; the union of 
the two “light cone bundles” is not a ray-optical structure in the sense of 
Definition 5.1.1 unless they are disjoint. 

We should keep in mind that Example 5.1.1 is not general enough to cover 
light propagation in arbitrary linear dielectric and permeable media. Accord- 
ing to Sect. 2.5 this would require a generalization to Finslerian metrics. 

Here is another example of a ray-optical structure. 

Example 5.1.2. Let go and gf be as in Example 5.1.1 and define a function 
H: T*M — > R by H{u) = ^(gf{u,u) -I- 1), i.e., 

H {x, p) = J (gf{x)pa Pb + 1) (5.2) 

O 

in terms of natural coordinates. Then Af = {u E T*A4 ] H{u) = 0} is a 
ray-optical structure oil M. 
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Fig. 5.2. For the ray-optical structure of Example 5.1.2, Mq is a two-shell hyper- 
boloid at each point q € M. 



According to Chap. 3, light propagation in a non-magnetized plasma is 
given by a ray-optical structure of this sort, generated by the (partial) Hamil- 
tonian (3.46). Here we have to put go = ujp g, where g is the spacetime metric 
and is the plasma frequency (3.51), and we have to restrict ourselves to 
regions where u)p has no zeros. 

Example 5.1.1 and Example 5.1.2 are special cases of the following more 
general construction. Let be a C°° Lorentzian metric on M and denote 
the induced fiber metric on T*M by g^. Fix a C°° function h: M — > R 

and define H\ T*M — >■ R by H(u) = | [g*{u,u) + Then 

O 

A/* = {u € T*M\ H{u) = 0} is a ray-optical structure on Ad. In regions 
where h vanishes this leads us back to Example 5.1.1 with go = g- In re- 
gions where h is strictly positive this leads us back to Example 5.1.2 with 
9o = hg. This generalized example has relevance for light propagation in 
a non-magnetized plasma which occupies only part of the spacetime region 
considered. Mathematically it exemplifies our earlier remark that, for an ar- 
bitrary ray-optical structure AT on Ad, the manifolds J\fq and J\fq> need not 
be diffeomorphic for two points q and q' in Ad. If h{q) — 0 and h{q') > 0, ATq 
is a double cone whereas Afqi is a two-shell hyperboloid, i.e., A/*g and Afq/ are 
not even homeomorphic, let alone diffeomorphic . 

We now turn to a different kind of examples for ray-optical structures. 

Example 5.1.3. Fix a C°° vector field U on M that has no zeros and define 

a function H : T*M — > R by H{u) = u{Uq) for all g € Ad and u € T*M, 
i.e.. 



H{x,p) = U°{x)pa 



(5.3) 
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in terms of natural coordinates. Then the set J\f = {u e T*M \ H{u) =0} 
is a ray-optical structure on M. 

In our discussion of light propagation in a non-magnetized plasma we 
encountered the (partial) Hamiltonian (3.44) which generates a ray-optical 
structure of this sort. This example will be useful in the following to demon- 
strate some possible pathologies. 

Example 5.1.3 admits the following generalization which is mathemati- 
cally interesting although somewhat contrived in view of physical applica- 
tions. Instead of a C°° vector field U without zeros, it suffices to have a C°° 
line field L, i.e., a map that assigns to each point q e M a. one-dimensional 
subspace Lg of the tangent space TgM. Then we define Afg as the set of all 

O 

covectors in T*A4 that annihilate all vectors in Lg. It is easy to check that 
A/ = {u G A/g I q € M } is indeed a ray-optical structure on A4. If L is not 
globally spanned by a C°° vector field without zeros, this ray-optical struc- 
ture AT is not globally generated by a Hamiltonian function, i.e., it is not of 
the kind considered in Proposition 5.1.1. 




(a) (b) 



Fig. 5.3. For the ray-optical structure of Example 5.1.3, Afg is a punctured hyper- 
plane (a), whereas for Example 5.1.4 it is a pair of hyperplanes (b). 



The following example is related to Example 5.1.3 in a similar way as 
Example 5.1.2 is related to Example 5.1.1. 

Example 5.1.4- Let U he a C°° vector field on AA that has no zeros and 
define a function H: T*AA — > M by H{u) = |(u(17q)^ — l) for all points 

O 

qG AA. and for all covectors u gT*A4, i.e., 

H{x,p) = \{U^{x)U^{x)paPb- 1) (5-4) 

O 

in terms of natural coordinates. Then the set Af = {u G T* AA \ H{u) = 0 } 
is a ray-optical structure on AA. 
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The partial Hamiltonian (3.45), which determines the plasma oscillations 
in a non-magnetized plasma, generates a ray-optical structure of this kind. 

O 

Here we have to divide the vector field U of (3.45) by the plasma frequency 
Wp, given by (3.51), to get the vector field f/, and we have to restrict to 
regions where the plasma frequency has no zeros. 




Fig. 5.4. For the ray-optical structure of Example 5.1.5, J\fq is a sphere. 



Finally, we mention an example that has no physical relevance if M is 
interpreted as a spacetime manifold. However, it is the most important ex- 
ample of a ray-optical structure if M is interpreted as space, e.g., in ordinary 
optics or in a static general-relativistic spacetime. In the latter context, we 
shall examine this example in more detail in Sect. 6.5 below. 

Example 5.1.5. Let gj^ be a (positive definite) Riemannian metric on M 
and denote the induced fiber metric on T*M by Define a Hamiltonian 

H-. T*M — > R by H{u) = ^[g^{u,u) - l), i.e., 

H {x, p) = I {gfpaPb - 1) (5.5) 

O 

in terms of natural coordinates. Then Af = {u e T*M\ H{u) = 0 } is a 
ray-optical structure on M. 

This ends our list of examples to which we shall come back frequently. 

It is now our goal to justify the term “ray-optical structure” by showing 
that, indeed, any such structure gives rise to the notions of rays. If A/” is a 

O 

ray-optical structure on A4, we can use the inclusion map i: Af — T*AA to 
pull back the canonical one-form 9 and the canonical two-form Q to Af. The 
resulting forms will be denoted by 



9j^ — i*9 and = i*Q . 



( 6 . 6 ) 
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Since the exterior derivative d commutes with the pull-back operation i*, 
these two forms are related by the equation f2j^ = d9j^. In particular, this 
implies that is a closed two-form, dQj\f = 0. Moreover, at each point 
u € Af the kernel of is one-dimensional since Q is non-degenerate and 
J\f has codimension one. This shows that (AT, is a contact manifold, see 
Abraham and Marsden [1], Definition 5.1.4 . 

A vector field X on is called a characteristic vector field iff it satisfies 
the equation • ) = 0. As the kernel of is one-dimensional, any two 

characteristic vector fields are linearly dependent. Integral curves of charac- 
teristic vector fields (or their projections to M) are often called characteristic 
curves. They give us the rays of A/", according to the following definition. 

Definition 5.1.2. Letfif he a ray-optical structure on M and let O// be the 
contact two-form on A/” defined by (5.6). A C°° immersion I — > M from 
a real interval I into fif is called a lifted ray iff 

•)=0 ( 5 . 7 ) 

for all s e I. Then the projected curve o ^ ; J — ^ M is called a ray . 

Clearly, lifted rays satisfy the following existence and (non-) uniqueness 
properties. 

(a) A lifted ray remains a lifted ray under an arbitrary reparametrization 
(which need not be orientation-preserving). 

(b) Through each point u e fif there is a lifted ray, and it is unique up to 
reparametrization and extension. 

Moreover, it is important to realize that a lifted ray is nowhere tangent to a 
fiber of T*M. (This follows from the fact that AT is everywhere transverse 
to the fibers.) Hence, every ray is an immersed curve in M. In other words, 
“rays do not stand still in A4” . 

In the case of Example 5.1.1 the rays are the light-like geodesics of the 
metric go whereas in the case of Example 5.1.2 they are the time-like geodesics 
of the metric go- In the case of Example 5.1.3 and 5.1.4 an immersed curve 
in A4 is a ray iff it is an integral curve of the vector field U. In the case of 
Example 5.1.5 the rays are the immersed geodesics of the metric g+. 

It follows immediately from the definitions that two ray-optical structures 
on M are equal if and only if they determine the same set of lifted rays. How- 
ever, it is very well possible for two different ray-optical structures to have 
the same rays. As an example we may consider a ray-optical structure con- 
structed from a (positive definite) Riemannian metric g+ as in Example 5.1.5 
such that the rays are the geodesics of the metric g+. If we change g+ by 
multiplication with a positive constant c ^ 1 we get a different ray-optical 
structure but the set of rays remains unchanged. This is obvious since g+ and 
c 5 ^- 1 - have the same Levi-Civita connection. 
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So far our notion of rays made no use of Hamiltonian functions. To cast the 
ray equation (5.7) into Hamiltonian form we need the following proposition 
which can be viewed as the converse of Proposition 5.1.1. 

Proposition 5.1.2. Let M he a ray-optical structure on M and fix a point 

O 

u € fif. Then there is an open neighborhood W of u in T*M. and a C°° 
function H : W — > R with the following properties. 

(a) fifnW = {w eW\H{w) = 0} ; 

(b) dH has no zeros on fif DW. 

Any such function H is called a local Hamiltonian forff. H is called a global 
Hamiltonian forJ\f if all of fif is covered by W, i.e., iff/cW. 

Proof This is an immediate consequence of the fact that, by Definition 5.1.1, 
Af is an embedded codimension-one C°° submanifold of T*A4. □ 

Later we shall give criteria for the existence of global Hamiltonians, see 
Proposition 5.1.4 below. 

We shall now rewrite the ray equation (5.7) in Hamiltonian form. To begin 
with, let us consider the special case of a ray-optical structure Af on M that 
admits a global Hamiltonian H : W — R and let us denote the Hamiltonian 
vector field of H by Xh> Then, at all points u e Af, the vector {Xh)u is 
non-zero and tangent to Af. Hence, XhW gives a nowhere vanishing vector 
field on Af. If the defining equation (4.1) of Xh is pulled back to Af, we find 
f^AfiXnW, • ) = 0, i.e., Xh\// is a characteristic vector field. Thus, any other 
characteristic vector field on Af must be a multiple of XhW> This implies 
that an immersion ^ : I — ^ Af is a lifted ray if and only if its tangent field 
is everywhere a multiple of Xh- Thus, lifted rays ^ are characterized by the 
equations 



ff(«s))=0, (5.8) 

%.)(«*), (5.9) 

where fe is a nowhere vanishing but otherwise arbitrary function. The freedom 
to choose this function at will corresponds to the fact that lifted rays can be 
arbitrarily reparametrized. 

If is a local rather than a global Hamiltonian, this result remains true 
on that part of Af which is covered by W. This implies that, with respect to 
a natural chart and a local Hamiltonian, lifted rays are characterized by the 
equations 
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JI^x(s),p(s)^ = 0, (5.10) 

x^(s) = k(s) ^ (ar(s),p(s)) , (5.11) 

Pa(s) = -^(s) ^(a^(s),p(s)) • (5-12) 



These considerations can be summarized in the following way. If we are lucky 
enough to have a global Hamiltonian ff for Af, the lifted rays of Af can be 
found by (i) solving Hamilton’s equations with this function H\ (ii) singling 
out those solutions that lie in Af\ (iii) allowing for arbitrary reparametriza- 
tions. If there is no global Hamiltonian, the same procedure can be carried 
through on the domain of each local Hamiltonian. On the mutual overlaps of 
those domains the lifted rays have to be patched up. 

Prom a geometrical point of view it is quite satisfactory to work with the 
contact manifold {Af, Ojsf) without refering to Hamiltonians. On the other 
hand, the use of (local) Hamiltonians leads to a formalism that looks more 
familiar to physicists. For that reason we will often refer to Hamiltonians. 

We end this section with two propositions that are helpful when working 
with local Hamiltonians. The first proposition clarifies the relation between 
two local Hamiltonians for one and the same ray-optical structure, the second 
proposition gives criteria for the existence of a global Hamiltonian. 

Proposition 5.1.3. LetAf he a ray-optical structure on A4 and assume that 
the C°° junction H : W — > M a local Hamiltonian for Af, defined on 

some open subset W ofT*Ad with AT D >V ^ 0. Then another C°° function 

O 

H : W — > M, defined on the same open subset W of T*AA., is again a local 
Hamiltonian for Af if and only if there is a C°° function F : W — > R \ {0} 
such that the equation H = FH holds on W. {By continuity, the function F 
must be either everywhere positive or everywhere negative.) 

Proof. Since the “if’ part is trivial, we just have to prove the “only if’ part. 
So let us assume that both H and H are local Hamiltonians for Af . Then 
Fo{u) = H{u)JH{u) defines a C°° function Fq: W\Af_ — > R \ {0} since 
both H and H are non-zero on W\Af. As dH and dH have no zeros on 
AT n W, the Bernoulli-l’Hopital rule guarantees that Fq has a continuous 
extension F:W — ^ R \ {0}. At a point u e A/" D W, the value of F is given 
by F{u) = {dH)u{X)/{dH)u{X), where X is any vector in Tu{T*Ad) which 
is non-tangential to A/". What remains to be shown is that at all points of 
A/ n W the function F is, indeed, of class C7’" for all r € N. This can be 
verified by induction over r where again the Bernoulli-l’Hopital rule has to 
be applied. ^ 

Proposition 5.1.4. Let Af be a ray-optical structure on AA. Then the fol- 
lowing properties are mutually equivalent. 
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(a) M admits a global Hamiltonian H. 

(b) There is a nowhere vanishing characteristic C°° vector field X on fif . 

(c) There is a nowhere vanishing C°° one-form on N such that the char- 
acteristic direction on J\f is transverse to the kernel of at all points of 

N. 

(d) M is orientable, i.e., there is a nowhere vanishing C°° (2n — l)-/om e 
on M. 

Proof. The implication “(a)=>(b)” is trivial since we can choose X — Xh\m- 
To prove the implication “(b)=>(c)” we put a C°° (positive definite) Rieman- 
nian metric gjf. on M. Such a metric exists since AT is a finite-dimensional 
paracompact manifold; for a proof of this well-known fact see Proposi- 
tion 2.5.13 in Abraham and Marsden [1]. Then = g+{X, •) will do the 
job. To prove the implication “(c)=j»(d)” we can put s = A 
where — Om A • • • A Qjsj- with (n — 1) factors on the right-hand 

side. Finally, we prove the implication “(d)=^(a)”. By assumption, J\f is an 

O 

orientable codimension-one submanifold of T*M. We choose an orientation 

O 

for Af and put a (positive definite) C°° Riemannian metric onT*M.. (The 
existence of such a metric is guaranteed by the same argument as above.) This 
defines an outward unit normal vector at each point u e A/. Let 1 1 — »• <^(w,t) 
denote the affinely parametrized G+-geodesic tangent to this unit vector at 
t = 0. Then a global Hamiltonian H : W — >• R for Af is well-defined on some 
neighborhood W of Af hy setting H(w) = t if and only if there is a u € Af 
such that w = 4>{u, t). □ 

Prom this proposition we should keep in mind, in particular, that ori- 
entability of Af is equivalent to the existence of a global Hamiltonian. We 
are thus in agreement with usual terminology if we define the choice of an 
orientation for Af in the following way. 

Definition 5.1.3. An orientation for a ray-optical structure Af is an equiv- 
alence class [H] of global Hamiltonians for Af. Here two global Hamiltonians 
forAf are called equivalent if they are related, according to Proposition 5.1.3, 
by a positive function F. After an orientation [H] for Af has been chosen, 
the parametrization of a lifted ray is called positively oriented if (5.9) holds 
with a positive function k for any H € [H] . 



5.2 Regularity notions for ray-optical structures 

In Proposition 5.1.3 we have seen that two local Hamiltonians H and H for 
one and the same ray-optical structure are related, on their common domain 
of definition, by an equation of the form H = F H where F is a nowhere 
v a nis hin g function. As a consequence, their derivatives with respect to the 
momentum coordinates in any natural chart have to satisfy the equations 
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H^ = FH^ and = F (5.13) 

on Af, where the abbreviations (4.7) have been used for the functions H, 
H, and F. This implies, in particular, that the usual regularity condition 
(4.6) cannot be satisfied at a point u eJ\f by all Hamiltonians for N". If this 
regularity condition is satisfied by one Hamiltonian for M it can always be 
spoiled by switching to another Hamiltonian with the help of an appropriate 
function F, according to (5.13). Therefore we define regularity of ray-optical 
structures in the following way. 

Definition 5.2.1. A ray-optical structure M on M is called regular at a 
point u £ M if there is a local Hamiltonian H for M, defined on some neigh- 
borhood W of u, such that the fiber derivative ¥H of H maps ff f\W dif- 

O 

feomorphically onto its image inTM.. This is true if and only if H satisfies 
condition (4.6) at u in any natural chart. A ray-optical structure Af on M 
is called hyperregular if there is a global Hamiltonian H for Af such that the 

o 

fiber derivative ¥H of H maps Af diffeomorphically onto its image inTAA. 

If we take a look at our five examples of ray-optical structures mentioned 
above, we find that Example 5.1.1, Example 5.1.2 and Example 5.1.5 give 
hyperregular ray-optical structures. In each of these cases the fiber derivative 

o o 

of the given Hamiltonian is a global diffeomorphism ¥H\T*AA — >• TAA.. 
The ray-optical structures of Example 5.1.3 and Example 5.1.4, on the other 
hand, are nowhere regular, provided that dim(A4) > 2. It is easy to check 
that on a two-dimensional manifold all ray-optical structures are everywhere 
regular. 

The best strategy to verify results of this kind is the following. To find out 
whether a ray-optical structure Af is regular at some point ueAf,we choose 
a local Hamiltonian H and a natural chart around u. As before, we use the 
abbreviations (4.7). If det(iJ“^(u)) ^ 0, it is obvious that Af is regular at u. 
If det(H“^(u)) = 0, we consider the second-order polynomial 

Pu{X \ . . . , X”) = det{H^\u) -f JET“(u) X*’ -f X“ H\u)) . (5.14) 

If this is the zero polynomial, i.e., if the equation P^(X^, . . . ,X”^) = 0 is 
satisfied by all (X^ . . . , X^) € we know that Af cannot be regular at u. 
This follows immediately from the fact that any other local Hamiltonian H for 
Af must be related to H by the transformation formulae (5.13). If, on the other 
hand, there is an (X^, . . . , X“) G with Pu(X^, . . . , X“) ^ 0, Afjaust be 
regular at u. In order to prove this we switch to a new Hamiltonian H = FH, 
choosing the function F in such a way that F{u) = 1 and P®(u) = X“. Then 
we can read from (5.14) that det(H“^(u)) is equal to Pu(X^ . . . , X*^) which, 
by assumption, is different from zero. 

IfAf is regular at u, an appropriate choice of a Hamiltonian near u gives us 
a local one-to-one correspondence between momentum covectors and velocity 
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vectors. It is clear that a ray-optical structure alone, without choosing a par- 
ticular Hamiltonian, cannot give such a correspondence since lifted rays can 
be arbitrarily reparametrized. Under such a reparametrization the momen- 
tum covectors remain unchanged whereas the velocity vectors are multiplied 
by a scalar factor. This observation suggests a different regularity notion 
for ray-optical structures. We want to call a ray-optical structure “strongly 
regular” if it gives a local one-to-one correspondence between momentum 
covectors and directions of velocity vectors. To make this precise we consider 
the vertical part of the ray equation together with the dispersion relation in 
a natural chart, i.e., the equations (5.10) and (5.11). The desired local one- 
to-one correspondence holds if and only if this system of equations can be 
solved for the momentum coordinates pi(s), . . . ,Pn(s) and for the stretching 
factor k{s). This solvability condition can be written in the form 



/(H'“^) (F“) 

V 0 



7 ^ 0 ) 



(5.15) 



where a is an index numbering rows and h is an index numbering columns 
such that we get an (n -|- 1) x (n -|- 1) matrix on the left-hand side of (5.15). 
With the help of (5.13) it is easy to check that (5.15) is, indeed, independent 
of the Hamiltonian chosen. Switching back to coordinate-free notation, this 
leads to the following definition. 

Definition 5.2.2. A ray-optical structure M on M is called strongly regular 
at a point ueAf if for one and hence for any Hamiltonian H ; W — > R, 

O 

defined on a sufficiently small neighborhood W of u in T*M, the map 
(Th : jV” n >V X R+ — >• TM defined by 



cth{w,c) = cFJT(iy) 



(5.16) 



is a diffeomorphism onto its image. This is the case if and only if in any 
natural chart condition (5.15) holds at u. A ray-optical structure ff on fA is 
called strongly hyperregular if there is a global Hamiltonian H for fif such 
that the map an : H x R+ — > TM. defined by (5.16) is a diffeomorphism 
onto its image. Here FJT denotes the fiber derivative of H and R'*' denotes 
the set of strictly positive real numbers. 

Strong regularity is easier to check than regularity; we just have to calcu- 
late the left-hand side of (5.15) with any Hamiltonian for J\f. 

If we look at our standard examples, we find that Example 5.1.2 and Ex- 
ample 5.1.5 give strongly hyperregular ray-optical structures. On the other 
hand. Example 5.1.1 gives ray-optical structures which are nowhere strongly 
regular. The same is true of Example 5.1.3 and Example 5.1.4 if we exclude 
the trivial case dim(A4) = 1. This shows that strong regularity is violated 
for several physically interesting ray-optical structures on spacetimes. (In the 
next section we shall see that a ray-optical structure that describes light 
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propagation in a non-dispersive medium on a spacetime cannot be strongly 
regular.) Nonetheless strong regularity is a physically useful notion, in partic- 
ular in the case that M is to be interpreted as space rather than as spacetime. 

We have not yet justified our terminology by showing that, indeed, strong 
regularity implies regularity. This follows from the next proposition. 

Proposition 5.2.1. A ray-optical structure M on M is strongly regular at 
a point u £ M if and only if there exists a local Hamiltonian H for M on 
some neighborhood of u such that in any natural chart 

(a) det (JET“^) ^ 0 and 

(b) HabH^H^¥^0 

at u. Here we use again the abbreviations (4.7) and Hat is defined through 

= Si. 

Proof. The “if’ part is a trivial exercise in linear algebra. To prove the “only 
if’ part, we assume that hf is strongly regular at u and fix a local Hamiltonian 
H for AT on a neighborhood W of u. If H satisfies (a) we are done since in this 
case, by (5.15), (b) is also satisfied. So let us assume that H does not satisfy 
(a). It is our goal to find another Hamiltonian H = FH for M such that (a) 
and, thus, (b) are satisfied if H is replaced with H. As, by assumption, the 
kernel of the matrix (H"®^) is non-trivial, our strong regularity assumption 
(5.15) guarantees existence and uniqueness of a vertical vector Ubd/dpb at 
u which satisfies H^^Ub = 0 and H^Ub - l.li W is sufficiently small, the 
function JT = i!f(jH + 1) : W — ^ M is, again, a local Hamiltonian for J\f. It 
is our goal to prove that the kernel of the matrix (H“*’) = (JT“^ -h 2 H^) 

is trivial. So let us assume that H°'^Yb = 0. We want to demonstrate that 
this implies Yb = 0. As H^ Ua = I, Yb can be decomposed in the form Yb = 
Zb + crib where H^ Zb = 0. Then our assumption takes the form Yb = 
ZbA2cH°' = 0. Owing to our strong regularity assumption this implies 
that the column vector with components . . . , 2 c is in the kernel of a 

matrix with non-zero determinant; hence, all these components are zero. □ 

This proposition shows that, at the level of local Hamiltonians, our strong 
regularity notion is equivalent to the so-called Condition N introduced by 
Guckenheimer [53]. 

For further illustrating strong regularity we introduce the following nota- 
tion. 

Definition 5.2.3. Let N be a ray-optical structure on M. For any q € M, 
the set 

c, = {\{ 0 )\x is a ray with A(0) = q} (^•1’^) 

is called the infinitesimal light cone of Af at q. The set 

C = X £ Cq ^ q £ AA )■ 

is called the bundle of infinitesimal light cones of Af. 



(5.18) 
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Here we allow ourselves a slight abuse of language insofar as C is not 
necessarily a fiber bundle over M. As rays can be arbitrarily reparametrized, 
X e Cq implies that cX € Cg for all c € R \ {0}. This justifies calling Cq a 

O 

“cone” . Clearly, a vector X e TqM is in Cq if and only if it is of the form 
X = ¥H{u) where u e Afq and is a local Hamiltonian for J\f which is 
defined around u. Moreover, for any local Hamiltonian H, the image of the 
map (th defined through (5.16) is a subset of C. Note, however, that even for 
a global Hamiltonian the image of <jh does not necessarily cover all of C; the 
reason is that the image of ajj need not be invariant under multiplication 
with negative numbers. 

In the case of Example 5.1.1, Cq = [X € TqM \ go{X,X) = O}, 
i.e., Cq equals the null cone of the Lorentzian metric go at q. In partic- 

O 

ular, Cq is a closed codimension-one submanifold of TqM in this case. 
The situation is completely different in the case of Example 5.1.2. Here 

O 

Cq = {X € TqM I go{X, AT) < 0 } equals the interior of the null cone of the 

O 

Lorentzian metric go and is, thus, an open subset of TqM. For the ray-optical 
structures of Example 5.1.3 and Example 5.1.4, Cg = {cC/g|ceR \ {0} } is 

O 

a one-dimensional submanifold of TqM whereas in the case of Example 5.1.5 
Cq is all of TqM. 

These examples show that C has very different features for different ray- 
optical structures. Moreover, they demonstrate that there is no obvious re- 
lation between the geometry of M and the geometry of C. In the case of 
Example 5.1.1, which dominates our intuitive ideas of general relativistic ray 
optics, Af and C are diffeomorphic. In the other cases, however, C looks com- 
pletely different from Af. 

The following proposition implies that in the strongly regular case C can- 
not be diffeomorphic to Af. 

Proposition 5.2.2. Let Af he a ray-optical structure on M and q e M. If 
Af is strongly regular at all points u € Afq, the infinitesimal light cone Cq is 

O 

an open subset of TqM. 

Proof. By Definition 5.2.2, strong regularity implies that the differential of 
the map an has maximal rank. □ 

This observation is exemplified by Example 5.1.2 and Example 5.1.5. 
Please recall that strong regularity guarantees that the system of equa- 
tions (5.10) and (5.11) can be solved for the momentum coordinates Po(s) and 
for the stretching factor k{s). It is worthwile to become clear about the in- 
formation contained in this system of equations. If a curve s i — >• (a;(s),p(s)) 
satisfies (5.10) and (5.11) with some k{s) but not necessarily (5.12), it de- 
termines at each of its points the same velocity vector as a lifted ray passing 
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through this point. Hence this curve, although not necessarily a lifted ray, 
describes an object moving at the velocity of light (in the medium for which 
M gives the dispersion relation). We now introduce a special name for such 
curves. 

Definition 5.2.4. LetM be an arbitrary ray-optical structure on M. A C°° 
immersion ^ : / — >• N from a real interval I into M is called a lifted virtual 
ray iff 



= 0 (5.19) 

for alls e I and all Z^(^s) € T^( 3 )Af with (TrJ^)(Z^(s)) = 0. Then the projected 
curve o ^ ^ M is called a virtual ray. 

This notion of “virtual rays” should not be confused with the notion of 
“virtual images” which is used in elementary optics. 

It follows immediately from the definitions that lifted virtual rays and 
virtual rays can be characterized in the following way. 

Proposition 5.2.3. Let ff be any ray-optical structure M on A4. Then a 
C°° immersion ^ : I — ^ ff is a lifted virtual ray if and only if 

(5.20) 

with some function k: I — > M \ {0}. Here H is any (local) Hamiltonian for 
the ray-optical structure ff. 

A C°° immersion A : I — ^ M is a virtual ray if and only if X(s) € Cx(s) 
for all s £ I. Here Ca(s) denotes the infinitesimal light cone introduced in 
Definition 5.2.3. 

Clearly, a ray is all the more a virtual ray whereas the converse is in 
general not true. In the case of Example 5.1.1 all ^fo-hght-like curves in Ai 
are virtual rays but only the c/'o-light-like geodesics are rays. In the case of 
Example 5.1.2 the rays are the ^o-time-like geodesics whereas all fifo-time-like 
curves are virtual rays. In the case of Example 5.1.3 and 5.1.4 an immersed 
curve in A4 is a ray iff it is a virtual ray iff it is an arbitrarily parametrized 
integral curve of the vector field U. In the case of Example 5.1.5 all immersed 
curves in M are virtual rays whereas only the p+-geodesics are rays. 

If ff is orientable, we can generalize Definition 5.1.3 in the following way. 

Definition 5.2.5. Letff be a ray-optical structure on M and [H] be an ori- 
entation for ff . Then a lifted virtual ray ^ of ff is called positively oriented 
if (5.20) holds with a positive function k for any H G [H]. Similarly, a virtual 
ray is called positively oriented if it is the projection of a positively oriented 
lifted virtual ray. 

In the next proposition we prove that in the strongly hyperregular case 
there is a global one-to-one correspondence between positively oriented lifted 
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virtual rays and positively oriented virtual rays, i.e., that at each point of 
a positively oriented virtual ray the momentum covector is uniquely deter- 
mined. 

Proposition 5.2.4. LetH he a ray-optical structure on M. and assume that 
A/" is strongly hyperregular. By Definition 5.2.2 this guarantees the existence 
of a global Hamiltonian H for Jf such that the map ajj-f/x K’*' — > TM. 
defined through (5.16) is a global diffeomorphism onto its image. Choose such 
a Hamiltonian, thereby defining an orientation [iJ] for M. Then for every 
positively oriented virtual ray X : I — > M. there is a unique positively oriented 
lifted virtual ray ^ : I — > M that projects onto X, o ^ = A. 

Proof. The nontrivial claim is the uniqueness of So let us assume that 
and ^2 do the job. Since lifted virtual rays have to satisfy (5.20), this 
implies that ki WHo^i = /c 2 Fjffo^ 2 < Since both and ^2 are supposed to be 
positively oriented, ki and k 2 have to be positive such that the last equation 
can be written in the form crH{$i{s),ki{s)) = (^ 2 ( 5 ) > ^ 2 ( 5 )) for all s e I. 

Since an is a diffeomorphism, this implies ^1 = ^ 2 - 

Example 5.1.5 demonstrates that the restriction to positively oriented 
lifted virtual rays is, indeed, necessary to get uniqueness. 



5.3 Symmetries of ray-optical structures 

As the symmetries of a ray-optical structure A/ on Ad we can view all dif- 

O 

feomorphisms on T*M that leave A/ invariant. For our purposes it will be 

O 

reasonable to restrict to those diffeomorphisms on T*M that are induced 
from diffeomorphisms on the base manifold A4. (Such diffeomorphisms are 
called “point transformations” in Hamiltonian mechanics.) To work this out, 
we have to recall that each diffeomorphism V’ ’• Ad — > Ad induces a cotan- 
gent map T*ip: T*M — > T*M which is again a diffeomorphism, defined 
by the equation = u(T'ip{X)) for all q € Ad, X € TqM and 

u € If is well known and easily verified that T*ijj leaves the canonical 

one-form 9 and, thus, the canonical two-form f2 invariant, i.e., 

{T*'il}ye^9 and {T*'il)YQ = Q. (5.21) 

For an invariant proof we refer to Abraham and Marsden [1], Theorem 3.2.12. 
As an alternative, the proof can be accomplished easily in a natural chart. If 
ijj is represented, in a local chart, by a map x 1 — > x', is represented 

in the pertaining natural chart by the map (x,p) 1 — {x',p') with p' given 
by (2.71). 

After these preparations we are now ready to introduce the following 
definition. 
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Definition 5.3.1. Let M he a ray-optical structure on M. The symmetry 
group Gjsf of J\f is, hy definition, the set of all diffeomorphism 'iJj: M. — >■ M 
such thatT*ij) leaves M invariant, Le., such that {T*ij)){u) £ M for allu € A/*. 

Clearly, G// is a group with respect to composition of maps. 

For the sake of illustration we take a look at the symmetry groups of our 
standard examples. In the case of Example 5.1.1, where ff is the set of all 
light-like covectors of a Lorentzian metric Qo, the symmetry group G/j- con- 
sists of all diffeomorphisms if ) : M — M for which the pulled back metric 
has the same cone bundle as Qo- This is the case if and only if tp*go is 
conformally equivalent to go, i-e., if and only if ip* go — go with some C°° 
function /: M — > M. For a proof of this well-known fact we refer, e.g., to 
Wald [146], p. 445. Hence, in the case of Example 5.1.1 the symmetry group 
G// equals the set of all conformal S 5 nnmetries of the Lorentzian manifold 
(M,go)- In particular, this implies that G// is a finite-dimensional Lie group. 
Similarly, in the case of Example 5.1.2 we find that the symmetry group is 
the group of all isometries of the metric go. Again, this is a finite-dimensional 
Lie group. In the case of Example 5.1.3, on the other hand, the symmetry 
group G// consists of all diffeomorphisms ip: M. — M. that map arbitrar- 
ily parametrized integral curves of TJ onto arbitrarily parametrized integral 
curves of U. In general, this is an infinite-dimensional subgroup of the dif- 
feomorphism group Diff(A4). The same result is found for Example 5.1.4, 
with the only difference that the diffeomorphisms ip have to respect the 
parametrization adapted to U in addition. Finally, in the case of Exam- 
ple 5.1.5, the symmetry group is the group of isometries of the Riemannian 
metric 

Next we want to show that for any e Gjv the induced cotangent map 
T*'0 maps lifted rays onto lifted rays. For later convenience we prove the 
following more general proposition. 

O 

Proposition 5.3.1. Letff be a ray-optical structure onM and^: T*M — > 

O 

T*M. be a C°° diffeomorphism. Assume that ^ leaves the canonical two 
form 17 invariant up to a factor, i.e., lP'*17 = f Q with some function 

O 

f: T*M — >• E, and that^ is fiber preserving, i.e., = rJ^(!Z'(u 2 )) 

whenever = rf^{u 2 ). Then the following properties are mutually 

equivalent. 

(a) ^ leaves Af invariant, i.e., W{u) € Af for all u eAf. 

(b) W maps each lifted ray of Af onto a lifted ray. 

(c) W maps each lifted virtual ray of Af onto a lifted virtual ray. 

Proof. First we assume that (a) is satisfied. To prove that then (b) and (c) 
hold true, we pull back the equation S'* 17 = / 12 to Af. As the diffeomorphism 

O 

S' leaves Af invariant and Af is a closed submanifold of T*A4, S' maps Af 
diffeomorphically onto itself. Hence, S'Jy^ I2jv = f {at 17^/' where : Af — Af 
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denotes the restriction of ^ to J\f. This implies that, for any C°° immersion 

O ■ )) • (5.22) 

Now let us assume that ^ is a lifted ray of J\f. Then, by (5.19), the right- 
hand side of (5.22) vanishes, so the left-hand side has to vanish as well, i.e., 
o ^ has to be a lifted ray as well. This proves the implication “(a)=»(b)” . 
To prove the implication “(a)=»(c)”, let us assume that ^ is a lifted virtual 
ray. Then, by Definition 5.2.4, the right-hand side of (5.22) vanishes on all 
vectors X such that (T^)~^{X) is vertical. Since ^ is fiber-preserving, the 
latter condition is equivalent to X being vertical. Thus, the left-hand side 
of (5.22) has to vanish on all vertical vectors, i.e., iP' o ^ has to be a lifted 
virtual ray. The proof of the converse implications “(b)=J»(a)” and “(c)=>(a)” 

O 

is trivial since a point u € T*Ai is in jV* if and only if there is a lifted ray 
through u if and only if there is a lifted virtual ray through u. □ 

If we apply this proposition to the map W' = we see that a 

diffeomorphism ■0; M — M is in Gj\f if and only if its cotangent map 
T*0 maps lifted rays onto lifted rays if and only if T*0 maps lifted virtual 
rays onto lifted virtual rays. This implies, in particular, that any 0 € Gj^ 
maps rays onto rays. Please note, however, that the converse is not true. A 
diffeomorphism 0 : M — ^ M that maps rays onto rays need not be in G//. 
This is in correspondence with our earlier observation that two different ray- 
optical structures on M may have the same rays. As an example we may 
consider a ray-optical structure constructed from a (positive definite) Rie- 
mannian metric g+ as in Example 5.1.5. Then a diffeomorphism 0 : M — >■ M 
such that 0*p+ = c 9+ with a positive constant c ^ 1 maps rays onto rays 
but it is not in the symmetry group Gj^. 

As an alternative, symmetries can be treated in terms of infinitesimal 
generators. This gives a symmetry algebra rather than a symmetry group. 
To make this definition precise we need the well-known fact that each vector 
field K on M defines a vector field K on T*M which is called the canonical 
lift of K. Let 

^:VCRxA4 — (5.23) 



denote the fiow of K, i.e., let 1 1 — ^t(^) denote the integrd curve of K that 
passes at t = 0 through the point q. Then the vector field K is defined by the 
condition that its flow 3 is given by the equation 3t = T*^-f In a natural 
chart the canonical lift of the vector field K = K“(x) takes the form 



K 



^ dx^ dx°- dpa 



(5.24) 



Comparison with (4.2) shows that K is the Hamiltonian vector field of the 
function H = 9{K) : T*M — > R, i.e., K = Xh^ In terms of a natural chart 
this function takes the form H{x,p) — Pa K°'{x). 
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Now the symmetry algebra of a ray-optical structure can be defined in 
the following way. 

Definition 5.3.2. Let M be a ray-optical structure on M., The symmetry 
algebra Q^f of N is, by definition, the set of all C°° vector fields K on M 
such that at all points of ff the canonical lift K of K is tangent to fif. 

It is easy to check that is a Lie algebra with respect to the usual Lie 
bracket of vector fields. Clearly, the one-parameter subgroups of Gj^f are in 
one-to-one correspondence with the complete vector fields in Qj^. 

In analogy to (5.21) the canonical lift of a vector field K satisfies 

= Q and = 0 (5.25) 

where L denotes the Lie derivative. Hence, by applying (a local version of) 
Proposition 5.3.1 to the (local) flow of K we get the following result. 

Proposition 5.3.2. LetJ\f be a ray-optical structure on M. Then for a C°° 
vector field K on M. the following properties are equivalent. 

(a) K is in 

(b) The flow of the canonical lift K of K maps lifted rays onto lifted rays. 

(c) The flow of the canonical lift K of K maps lifted virtual rays onto lifted 
virtual rays. 

In Hamiltonian mechanics it is well-known that symmetries give rise to 
constants of motion. Similarly, every element of is associated with a func- 
tion on T*M which is constant along each lifted ray. This is shown in the 
following proposition. 

Proposition 5.3.3. Let M be a ray-optical structure on Ai and K G Qjq-. 
Then the function 6{K ) : T*M — >• M is constant along each lifted ray. Here 
9 denotes the canonical one- form on T*Ai and K denotes the canonical lift 
ofK. 

Proof. We fix a point u e ff and a local Hamiltonian H for fif around u. 
Then the definition of the exterior derivative d implies that {dO){XH,K) = 
Xjj {9{K)) — K (9 {Xh)) —^{[Xh^H]) where [ • , • ] denotes the Lie bracket of 
vector fields. On the left-hand side we use the definition (4.1) of the Hamil- 
tonian vector field Xh, on the right-hand side we use (5.25). This results in 
—dH{K) = Xh{9{K)). On N, the left-hand side vanishes since K is tangent 
to Af. Hence, the right-hand side has to vanish on Af as well. This implies 
that the function 9{K) is constant along each integral curve of Xh which is 
contained in Af, i.e., along each lifted ray. □ 

UK eQj\f is represented in a local chart as K = d/dx^, which is possible 
locally around any point of A4 where K does not vanish, the constant of 
motion ^(J^) is exactly the corresponding momentum coordinate, 9{K) = Pn- 
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For this reason 9{K) is called the momentum of the infinitesimal symmetry 
KeGu. 

The fact that symmetries imply constants of motion is of particular rel- 
evance in view of dimensional reductions. Given a ray optical structure AT 
on M, any subgroup G of the symmetry group Gj^ can be used to define an 
equivalence relation on M by 

qi ~ Q 2 there is a. 'i/j e G such that qi = V’(? 2 )- 

If the action of G on Ad satisfies some regularity conditions, the quotient space 
jCi — Mf,.^ can be furnished with a manifold structure such that the natural 
projection pr: Ad — Ad becomes a submersion. If G is an r-dimensional 
Lie group, Proposition 5.3.3 gives rise to r constants of motion. Fixing a 
value for each of them singles out a certain subclass of lifted rays of N. 
Circumstances permitted, this subclass of lifted rays projects onto a reduced 
ray-optical structure A^ on Ad. We shall discuss this reduction formalism in 
full detail for stationary ray-optical structures on Lorentzian manifolds in 
Sect. 6.5 below. Relevant background material on the general features of the 
reduction formalism can be found in Chap. 4 of the book by Abraham and 
Marsden [1]. 

The possibility to use symmetries for dimensional reduction is an impor- 
tant motivation for considering ray-optical structures on bare manifolds of 
unspecified dimension. 

We end this section with some remarks on the isotropy subgroup of 
Gjv*. This is defined, for each q e Ad, by 

G^ = {^|^eG^r\m = Q}■ ( 5 . 26 ) 

For any ifj € G^, the cotangent map restricted to T*M gives us a 
linear automorphism Tq^ip: T*M — T*M that leaves the manifold Xq = 
A/* n T*Ad invariant. We introduce the following definition. 

Definition 5.3.3. Let J\f he a ray-optical structure on Ad and fix a point 
q G M. By definition, the structure group off/ at q is the set of all linear 
automorphisms T*M — T*Ad that leave the manifold f/q = f/ C\T*fA 
invariant. 

In the case of Example 5.1.2, the structure group at q consists of all 
Lorentz transformations of the metric go\q, whereas in the case of Exam- 
ple 5.1.1 it contains the multiplications with nonzero numbers in addition. 
In the case of Example 5.1.4, the structure group at q is represented by all 
invertible matrices of the form 



/ AJ .. 


■ • A? \ 


4-1 • 




\ 0 • 


.. 0 Ai J 




(5.27) 
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in a basis such that U^{q) = whereas in the case of Example 5.1.3 the 
equation = 1 must be satisfied in addition. Finally, in the case of Exam- 
ple 5.1.5, the structure group at q consists of all linear automorphisms that 
are orthogonal with respect to the metric g+\q. 

As long as A4 is a bare manifold, the only distinguished linear auto- 
morphisms on T*M are dilations and inversions, i.e., multiplications with 
positive or negative numbers. The question of whether or not a ray-optical 
structure is invariant under dilations and inversions is of particular relevance. 
Therefore we devote the next subsection to this question. 



5.4 Dilation-invariant ray-optical structures 

The notion of dilation-invariance for ray-optical structures will give us a 
mathematically elegant characterization of media which are dispersion-free. 
For this reason the following definition is of paramount importance in ray 
optics. 

Definition 5.4.1. A ray-optical structure Af on Ai is called dilation-invari- 
ant at a point q € A4 if e*u € Afq for all u G Afq and t G M.. Af is called 
reversible at a point q G AA if —u G Afq for all u G Afq> Af is called dilation- 
invariant (or reversible, resp.) if it is dilation-invariant (or reversible, resp.) 
at all points q G AA. 

If AA is to be interpreted as a general-relativistic spacetime, a dilation-in- 
variant ray-optical structure on M is called dispersion-free or non- dispersive, 
otherwise it is called dispersive. In Chap. 6 below we shall link up this ter- 
minology with the physics textbook definition of non-dispersive media, i.e., 
we shall use the notions of phase velocity and group velocity for characteriz- 
ing dilation-invariant ray-optical structures. These notions refer to a time-like 
vector field; hence, they can only be introduced if there is a Lorentzian metric 
on AA. since otherwise we do not know what is meant by “time-like” . 

A brief look at our standard examples shows the following. Whereas Ex- 
amples 5.1.1 and 5.1.3 are dilation-invariant and reversible. Examples 5.1.2, 
5.1.4, and 5.1.5 are only reversible but not dilation-invariant. 

For each t G M, the dilation 

; T*AA — ^ T*AA , u^e^u (5.28) 

is represented in a natural chart by the map (x,p) i — >• (x,e*p). With the 
help of this representation it is readily verified that leaves the canonical 
one-form 9 — Pa dx°' invariant up to a factor, 

^*9 = e* ^ . (5-29) 

Application of the exterior derivative d yields 
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= e*0. (5.30) 

Applying Proposition 5.3.1 to the map W = $t proves the following. 

Proposition 5.4.1. For a ray-optical structure M on M, the following prop- 
erties are mutually equivalent. 

(a) AT is dilation-invariant. 

(b) Each lifted ray ^ : I — > J\f remains a lifted ray if it is multiplied pointwise 
with a positive number, • ) i — >■ e* ^( • ). 

(c) Each lifted virtual ray ^ : I — ^ M remains a lifted virtual ray if it is 
multiplied pointwise with a positive number, ^( • ) i — ^ e* ^( • ). 

O O 

Similarly, the inversion %: T*M — ^ T*M is represented in each natural 
chart by the map (x,p) i — > {x, —p). This implies that x*^ = and = 
—O. Hence, Proposition 5.3.1 can be applied to the map S' = x as well, 
resulting in the following proposition. 

Proposition 5.4.2. For a ray-optical structure J\f on M, the following prop- 
erties are mutually equivalent. 

(a) ff is reversible. 

(b) Each lifted ray ^ : I — > ff remains a lifted ray if it is pointwise inverted, 

(c) Each lifted virtual ray ^ : I — >• ff remains a lifted virtual ray if it is 
pointwise inverted, • ) i — > — ^( • ). 

The dilation-invariant case can also be characterized in terms of the vector 
field that generates the one-parameter group of dilations. Since (5.28) satisfies 

O 

all properties of a global C°° flow on T*M, we can define a vector field 
E on T*M by 



E„ 



d_ 

dt 






(5.31) 



for all u £ T*M.. The integral curves of E are the radial lines in the fibers of 
the cotangent bundle. This vector field E is known as the Euler vector field 

O 

or Liouville vector field on T*M. In a natural chart E takes the form 

E = pa^. (5.32) 

dpa 

(5.29) and (5.30) imply that the Lie derivatives of the canonical one-form 
and of the canonical two-form with respect to the Euler vector field satisfy 

Le6 — 0 and Le^ — O . (5.33) 



Moreover , the identities 
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e{E) = 0 and -2 0{E,-) = e (5.34) 

are easily verified in a natural chart. In terms of the Euler vector field E 
dilation-invariant ray-optical structures are characterized by the following 
proposition. 

Proposition 5.4.3. A ray-optical structure M on M. is dilation-invariant 
if and only if the Euler vector field E is tangent to M at all points of ff. 

Proof. The “only if’ part follows directly from Definition 5.4.1. For the “if’ 

O 

part one has to use the fact that M is closed in T*M in addition. □ 

In terms of (local) Hamiltonians dilation-invariance can be characterized 
in the following way. 

Proposition 5.4.4. A ray-optical structure M on M. is dilation-invariant if 
and only if any local Hamiltonian H for M satisfies the equation dH {E) = 0 
on M. 

The proof follows, immediately from the definitions. By (5.32), the equa- 
tion dH(E) = 0 takes the form 

in a natural chart. (5.35) is certainly satisfied on A/” if H is a homogeneous 
function (of any degree) with respect to the momentum coordinates pa- 
If TV is a dilation-invariant ray-optical structure and u E J\f, then the 
whole radial line { e* € R } must be in Af. This implies that Afq = AfCiT* M 
necessarily has a non- void intersection with each neighborhood of the origin 
in T*M. Hence, A/q cannot be closed in T*M. Typically, the closure Afq U {0} 
of Alq in T*AA fails to be a smooth manifold at the origin but forms something 
like a tip or a vertex there like in our Example 5.1.1, see Figure 5.1. For a 
dilation-invariant ray-optical structure, Afq U {0} is a smooth manifold at the 
origin if and only if it is a hyperplane. This situation is encountered in our 
pathological Example 5.1.3, see Figure 5.3 (a). 

Proposition 5.4.4 can be used to further characterize dilation-invariant 
ray-optical structures in the following way. 

Proposition 5.4.5. Let Af be a dilation-invariant ray-optical structure on 

o 

A4. Fix a point u EAf and, on some open neighborhood W of u in T* Ad, a 
natural chart {x,p) and a local Hamiltonian H for Af. Then there is a c € R 
such that 



(H“^) (H“)\ /(p6)\ ^ ^0\ 

{H’’) 0 J \ c J [oj 



(5.36) 



at u . Here we use the abbreviations (4.7) and the same matrix notation as 
in (5.15). 
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Proof. By assumption, H has to satisfy equation (5.35) at all points offfnW. 
This gives the last row of the matrix equation (5.36). Now consider the set 
of all vertical vectors Za at the point u. Such a vector is tangent to M if 

and only if it satisfies Za = 0. In this case it can be applied to equation 

(5.35) as a derivation resulting in pi Za = 0 at w. This shows that the 

equation pb = — c holds at u with some real number c. As the last 

row of (5.36) was already verified, this completes the proof. □ 

Recalling Definition 5.2.2 of strong regularity, this proposition has the 
following important consequence. 

CorollEiry 5.4.1. A dilation-invariant ray-optical structure M on M. cannot 
be strongly regular at any point u € M. 

Proof. Since at least one of the momentum coordinates must be non-zero at 
w, (5.36) implies that the (n -fl) x (n -f 1) matrix on the left-hand side has 
zero determinant. On the other hand, non-vanishing of^this determinant was 
the defining property of strong regularity according to Definition 5.2.2. □ 

This corollary says that, if Ad is a spacetime and J\f describes light prop- 
agation in a medium on this spacetime, strong regularity can hold only if the 
medium is dispersive. 

Moreover, with the help of Proposition 5.4.5 it is easy to verify that the 
fiber derivative of a (local) Hamiltonian of a dilation-invariant ray-optical 
structure maps radial lines { e* u j t € M } to radial lines {e^¥H{u) \ t eR} 
for u e ff. This implies that, for a dilation invariant ray-optical structure, 

O 

the infinitesimal light cone Cq is a closed subset of TqM. and that it has 
codimension > 1 if it is a submanifold. (Please recall Definition 5.2.3 and the 
subsequent discussion.) This general result is exemplified by Example 5.1.1 
and Example 5.1.3. On the other hand, transversality of Af to the flow of the 
Euler vector field is not sufficient for the infinitesimal light cones to be open. 
This is demonstrated by Example 5.1.4. 

The following proposition gives a pointwise characterization of dilation- 
invariant ray-optical structures that will be of relevance later. 

Proposition 5.4.6. Let N be a ray-optical structure on M. Then, for any 
point u € ff, the following properties are mutually equivalent. 

(a) The Euler vector field E is tangent to M at the point u. 

(b) Every characteristic vector field X on M satisfies =0 atu. 

(c) A =0 atu. 

Here means the antisymmetrized tensor product 1?// A • • • A with 

(n — 1) factors. 
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Proof. To prove the equivalence of (a) and (b) we recall that locally every 
characteristic vector field on jV" is the Hamiltonian vector field of a local 
Hamiltonian for M. Thus, (b) holds true if and only if 6{Xh) = 0 at u 
for any local Hamiltonian H. Owing to the second identity given in (5.34) 
and the definition (4.1) of the Hamiltonian vector field, this is equivalent to 
dH{E) = 0 at u for any local Hamiltonian H, i.e., it is equivalent to (a). 

Now we prove the equivalence of (a) and (c). It is obvious that, at each 

O 

point of T*A4, 9 A is a non-zero (2n — l)-form and has, thus, a one- 
dimensional kernel. Prom (5.34) we read that this kernel is spanned by the 
Euler vector field E. Hence the pull-back of ^ A to A/” vanishes exactly 
at those points where E is tangent to M. D 

According to this proposition, 6j\/ A has no zeros if AT is every- 
where transverse to the fiow of the Euler vector field. In this case (A/" , 0jv) 
is an exact contact manifold in the terminology of Abraham and Marsden 
[1], Definition 5.1.4. In other words, transversality of AT to the flow of the 
Euler vector field guarantees that A/* is orientable and that there is even a 
canonical volume form, viz. 6j^ A on J\f. By Proposition 5.1.4, this im- 
plies in particular the existence of a global Hamiltonian for such a ray-optical 
structure. 

Proposition 5.4.6 has the following important consequence for lifted vir- 
tual rays. (Please recall Definition 5.2.4.) 

Proposition 5.4.7. Let M be a ray-optical structure on M. A lifted virtual 
ray ^ : / — > N satisfies the equation 

%,)(ew)=0 (5.37) 

at the parameter value s e I if and only if the Euler vector field E is tangent 
to AT at the point ^(s). 

Proof By Proposition 5.2.3, the tangent vector of a lifted virtual ray is the 
sum of a characteristic vector tangent to fif and a vertical vector. Since all 
vertical vectors are in the kernel of the canonical one-form, now the statement 
follows firom the equivalence of (a) and (b) in Proposition 5.4.6. □ 

In Hamiltonian mechanics, integration over the canonical one-form gives 
the socalled action functional In this terminology. Proposition 5.4.7 says that 
for a dilation-invariant ray-optical structure the action functional vanishes on 
all lifted virtual rays. This observation will be of great importance for our 
discussion of variational principles in Chap. 7 below. 

If, on the other hand, the Euler vector field is everywhere transverse 
to AT, Proposition 5.4.7 guarantees that every lifted virtual ray admits a 
reparametrization ^ such that 



%.)(«*)) = !• 



(5.38) 
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This gives a canonical parametrization for each lifted virtual ray ^ which 
is unique up to an additive constant, ^(s) i — ^ + ^o)- In the case of 

Example 5.1.5 this distinguished parametrization gives g^-axc length along 
each virtual ray ^ ° i-e., g^(X,X) = 1. Similarly, in the case of 

Example 5.1.2 the distinguished parametrization gives go-proper time along 
each virtual ray (= 5 ' 0 -time-like curve). Finally, in the case of Example 5.1.4 
the distinguished parametrization is adapted to U or adapted to — 1 / along 
each virtual ray (= integral curve of U) A = o i.e., A = ± 17 o A. 

No such distinguished parametrization exists if Af is dilation-invariant 
since then every lifted virtual ray satisfies equation (5.37). 



5.5 Eikonal equation 

Prom the examples studied in Part I we know that families of rays are asso- 
ciated with families of wave surfaces. In this chapter we want to study the 
relation of rays and wave surfaces in our general geometrical setting. In par- 
ticular we want to introduce, for arbitrary ray-optical structures in the sense 
of Definition 5.1.1, an eikonal equation by which the dynamics of wave sur- 
faces is determined. It is largely a matter of taste whether one considers rays 
as more fundamental than wave surfaces or vice versa. Our intuitive ideas of 
light propagation are normally based on the notion of rays, rather than on 
the notion of wave surfaces. On the other hand, the derivation of ray optics 
from Maxwell’s equations leads to the eikonal equation first and to the ray 
equation at a later stage, as we have seen in Part I. 

Formally the eikonal equation of a ray-optical structure M on M can 
be introduced as the Hamilton- Jacobi equation determined by any (local) 
Hamiltonian for J\f. More precisely, we say that a C°° function 5 : U — R, 
defined on some open subset U of M, is a classical solution of the eikonal 
equation of Af iff it satisfies the equation 

H{dS{q)) =0 for all g e W . (5.39) 

Here the differential of the function S is to be viewed as a local section in the 
cotangent bundle, i.e., as a map dS: U — > T*U C T*AA^ and H denotes any 

O 

local Hamiltonian for Af whose domain of definition W Q T*A4 covers the 
point dS{q). In a natural chart the eikonal equation (5.39) takes the more 
familiar form 



H{x,dS{x)) = 0. (5.40) 

(5.39) can be rewritten without any reference to local Hamiltonians as 

dS{U) C Af 



(5.41) 
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where dS{U) denotes the image of the section dS: U — > T*U C T*M. 

O 

Since A/" is a subset of T*Ad, (5.41) automatically guarantees that dS has 
no zeros. Hence, the function S determines a foliation (or “slicing”) of the 
open subset U of M into smooth hypersurfaces S = const, which are called 
wave surfaces (or eikonal surfaces^ or phase surfaces). The motivation for 
this terminology comes, of course, from the approximate-plane-wave method 
outlined in Sect. 2.2. 

In the case of Example 5.1.1 a wave surface is a gro-light-like hypersurface 
whereas it is a ^fo-space-like hypersurface in the case of Example 5.1.2. In 
the case of Example 5.1.3 a wave surface is foliated into integral curves of 
the vector field U whereas it is transverse to U in the case of Example 5.1.4. 
Finally, in the case of Example 5.1.5 a wave surface is a completely arbitrary 
hypersurface. 

The eikonal equation can be viewed analytically as a partial differen- 
tial equation for a function S. As an alternative, suggested by (5.41), the 
eikonal equation can be viewed geometrically as the problem of finding an 
n-dimensional manifold (with certain properties) that is contained in a given 
(2n — l)-dimensional manifold. Henceforth we take the geometrical point of 
view which is of great advantage for global questions. In other words, we turn 
our attention away from the function S and concentrate upon the manifold 
dS{U). 

For any C°° function S : U — ^ R, dS{U) is an n-dimensional C°° sub- 
manifold of T*M and it is transverse to the fibers. Moreover, dSiJA) is a 
socalled “Lagrangian submanifold” of the symplectic manifold (T*Ad,l?). 
This notion, which will be at the center of this section, is defined in the 
following way. 

Definition 5.5.1. Let C C T*M be an embedded C°° submanifold ofT*M 
and denote the inclusion map by j: C — > T*M.. 

(a) £ is called isotropic iff the pull-back with j of the canonical two-form O 

vanishes, j*0 = 0. 

(b) £ is called Lagrangian iff C is isotropic and dim(£) = dim(A4). 

Definition 5.5.1 admits an obvious generalization for immersed, rather 
than embedded, submanifolds of T*M. 

The non-degeneracy of Q immediately implies that an isotropic sub- 
manifold of T*M must have dimension < dim(A4). Thus, Lagrangian sub- 
manifolds are isotropic submanifolds of maximal dimension. The name “La- 
grangian submanifold” was introduced by Maslov and Arnold in the 1960s. 
It refers to the following characterization of such submanifolds in terms of 
the classical Lagrange brackets. Consider a fc-dimensional embedded subman- 
ifold C of T*M‘, let (ui, . . . ,Ufc) be any local chart on the manifold £ and 
let (x^, . . . ,x“,pi, . . . ,Pn) be a natural chart (or, more generally, a canonical 
chart) on r*A4. Then the classical Lagrange brackets are defined as 
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/ N _ dx°' dpa dx^ dpa 
[UAtUB) dUB duA 



(5.42) 



for .A, B = 1, . , . , fc. Clearly, the same expression can be written without any 
reference to a natural (or canonical) chart as 



(«A,«e) - (fO) ) (5.43) 

for yl, jB = 1, . . . , fc where j*f2 denotes the pull-back of Q with the inclusion 
map j: C, — T*M. Prom (5.43) it is obvious that the Lagrange brackets 
vanish for all A, jB = 1, . . . , A: if and only if the submanifold £ is isotropic. In 
other words, Lagrangian submanifolds are submanifolds of maximal dimen- 
sion for which the Lagrange brackets vanish identically. 

Properties of Lagrangian submanifolds are detailed in many articles and 
textbooks, e.g., in Weinstein [148], Guillemin and Sternberg [55]), Abraham 
and Marsden [1], and Woodhouse [150]. In the following two propositions 
we recall some well-known facts which are of particular relevance for us, cf. 
Figure 5.5. 

Proposition 5.5.1. Let S: U — > R 6e a C°° function defined on an open 
subset U of M.. Then C = dSifJ) is an embedded Lagrangian submanifold of 
T*M. which is everywhere transverse to the fibers. 



Proof. The only non-trivial claim is that C is isotropic. To prove this, we 
recall that the canonical one-form 9 on T*M satisfies (3*9 = P where P is 
any local section in T*M. Hence, {dS)*9 = dS. Now we apply the exterior 
derivative d to this equation. Upon using the identity dd — 0 and the fact 
that d commutes with the pull-back operation, this results in {dS)*f2 = 0. 
As the inclusion map j : C — > T*M can be written in the form j = dSorf^, 
this implies j*L2 = (rj^)*(d5)*l7 = 0. □ 



Proposition 5.5.2. Let C be an n-dimensional embedded Lagrangian C°° 
submanifold of T*M. which is everywhere transverse to the fibers. Then C 
can be represented, locally around each of its points, in the form C — dS{U) 
where S: U — > R is a function defined on an open subset U of Ai. 
Moreover, if C is simply connected, C can he globally represented in this way. 
S is then called a generating function for C. 

Proof. Since £ is an n-dimensional C°° submanifold of T*M transverse to the 
fibers, it is the image of a local section in T*M. Thus, there is a (necessarily 
open) subset U oi M. and a one-form P\ U — T*U C T*M such that 
£ = pifA). Since £ is Lagrangian, P*Q — 0. On the other hand, the defining 
property of the canonical one-form 9 on T*M. guarantees that P*9 — P 
and, thus, P*Q = —dp. Comparison of these two results gives dP — Q.liU 
is simply connected, this implies that P is of the form P = dS with some 
function S : U — R which is unique up to an additive constant. (5 can be 
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Fig. 5.5. A Lagrangian submanifold C of T* M. which is everywhere transverse to 
the fibers is locally generated by a function 5 on At, i.e., C, = dS(U). 



defined by fixing a point q' in U and setting S{q) = (3 where the integral 

is to be performed along any path from q' to q. If U is simply connected, 
the equation dP — 0 guarantees that the result is independent of the path 
chosen. This follows from the well known Stokes theorem, see, e.g., Abraham 
and Marsden [1], p. 138). Since U is simply connected if and only if C is 
simply connected, this proves the second claim. The first claim follows from 
the fact that each point in U has a simply connected neighborhood. □ 

These two propositions have the following consequences, which are illus- 
trated in Figure 5.5. Any classical solution S : U — ^ M of the eikonal equation 
determines a Lagrangian submanifold C = dS{U) of T*Ad which is transverse 
to the fibers and completely contained in N. Conversely, to any Lagrangian 
submanifold C of T*M which is transverse to the fibers and completely con- 
tained in N we can find, on each simply connected subset U of a clas- 

sical solution S of the eikonal equation such that £n{r)^)~^{U) = dS(U); this 
solution S is unique up to an additive constant. These observations suggest 
the following definition. 

Definition 5.5.2. Let M be a ray-optical structure on M. A generalized 
solution of the eikonal equation of is a Lagrangian C°° submanifold £ of 
T*M which is completely contained in N. 

The following proposition guaranteees that any generalized solution of the 
eikonal equation determines an (n — l)-parameter family of lifted rays. 

Proposition 5.5.3. Let J\f be a ray-optical structure on M. and let C be an 
embedded Lagrangian C°° submanifold of T*M which is completely contained 
in Af. Then C is foliated into lifted rays. 
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Proof. Fix a point u£ C and let G TuM C Tu{T*M) be a characteristic 
vector for Af, i.e., {Om)u{Xu^ •) = 0. This implies that QuiXu^Zy) = 0 for 
all vectors Zy 6 TyC C TyAf. Since C is Lagrangian, i.e., maximally isotropic, 
this can be true only if Xu G T^£. In other words, at all points u e C the 
characteristic direction of Af must be tangent to C. Hence, C must be foliated 
into integral curves of characteristic vector fields. □ 



V £ 




Pig. 5.6. A generalized solution £ to the eikonal equation can be constructed, ac- 
cording to Proposition 5.5.4, by applying the characteristic flow to an appropriately 
chosen isotropic submanifold V. 



If we combine this observation with Proposition 5.5.1, we find that each 
classical solution S: U — > R of the eikonal equation is associated with a 
congruence of rays on U. Those rays are the projections to A4 of the lifted 
rays into which £ = dS{U) is foliated. (In terms of a local Hamiltonian and a 
natural chart this construction is already known to us from Sect. 2.4.) In other 
words, a classical solution S : U — R of the eikonal equation determines, 
on its domain of definition U Q Ad, not only a “slicing” into wave surfaces 
S = const, but also a “threading” into rays. Later in this section we shall 
inquire whether rays and wave surfaces are transverse to each other. 

The following proposition gives a construction method for generalized 
solutions of the eikonal equation, please cf. Figure 5.6. 

Proposition 5.5.4. Let Af be a ray-optical structure on Ad. Fix an (n — 1)- 
dimensional embedded C°° submanifold V of T* Ad such that V is completely 
contained in Af, isotropic, and non-characteristic. By the latter condition we 
mean that, at all points u E.V C Af, the characteristic direction of Af is non- 
tangent to V. Let C be the set of all points in T* Ad that can be connected to 
a point of V by a lifted ray. Then C is a generalized solution of the eikonal 
equation of Af. {In general, £ is only an immersed but not an embedded 
submanifold of T* Ad. However, if we restrict to an appropriate neighborhood 
of V this construction always gives an embedded submanifold.) 
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Proof. C is defined as the image of V under the flow of a characteristic vector 
field. Since “P is (n — l)-dimensional and non-characteristic, C must be an n- 
dimensional immersed submanifold of A/". What remains to be shown is that C 
is isotropic, i.e., that the pull-back of 1? to £ vanishes. If (s,u) ' — ^s(w) 

denotes the flow of a characteristic vector field on A/, the image of V under 

is 3'^ isotropic submanifold (for each s € R for which this image is non- 
empty). This follows from the fact that the Lie derivative of Q with respect 
to a Hamiltonian vector field on T*M vanishes. Hence, at each point of C 
the tangent space to C is spanned by the characteristic direction and by 
the tangent space to an isotropic submanifold. This proves that C must be 
isotropic. 

At the level of (local) Hamiltonians this is a standard result, cf., e.g., 
Abraham and Marsden [1], Lemma 5.3.29. 

The construction of Proposition 5.5.4 can be carried through, in partic- 
ular, for the special choice V = Mq = M T*M where q is any point in 
Ad. Since Afq is completely contained in the fiber T*M.^ it is, indeed, non- 
characteristic and isotropic. Hence the image of P under the characteristic 
flow gives a generalized solution C of the eikonal equation. Clearly, this C 
cannot be transverse to the fibers at q. The projection of £ to Ad gives the 
set of all points in Ad that can be joined to g by a ray. 

On the other hand, it is also possible to choose the initial surface P in 
such a way that the projection maps P diffeomorphically onto an (n- 1)- 
dimensional submanifold of Ad. In this case the resulting generalized 

solution £ of the eikonal equation is transverse to the fibers, and thus of the 
form dS{U), near P. It is foliated into lifted rays which, if projected to Ad, 
give a congruence of rays that intersect Tj^{V) transversely. Farther away 
from P, however, £ need not be transverse to the fibers and neighboring 
rays may intersect each other, see Figure 5.6. This shows that it is necessary 
to consider generalized solutions, rather than just classical solutions, of the 
eikonal equation if one wants to treat global questions. 

Proposition 5.5.4 has the following interesting consequence. 

Proposition 5.5.5. Let I — > J\f be a lifted ray of a ray-optical struc- 
ture ff on M and s e I. Then there is an e > Q and a classical solution 
S: lA — > R of the eikonal equation for Af such that ^(s') € dSiU) for all 
s' €]s — e,s + e[. 

Proof Construct a generalized solution £ of the eikonal equation according 
to Proposition 5.5.4, with an initial manifold P that passes through the point 
^(s) and is transverse to the fibers at that point. Then £ must be transverse 
to the fibers on some neighborhood of A(s), i.e., it can be written as the image 
of a differential dS on that neighborhood. As, by construction, ^ is contained 
in £, this concludes the proof. 

Quite generally. Proposition 5.5.4 gives a generalized solution of the 
eikonal equation in the form of an (n — l)-parameter family of lifted rays, 
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parametrized by the points of “P. It is crucial to realize that only very special 
(n — l)-parameter families of lifted rays give rise to a generalized solution 
of the eikonal equation. This special property is in the condition of V be- 
ing isotropic which guarantees that C is Lagrangian. This condition on the 
(n - l)-parameter family of lifted rays can be viewed as an integrability con- 
dition. The following proposition is helpful to clarify the geometric meaning 
of this integrability condition. 

Proposition 5.5.6. Let C he an embedded C°° submanifold of T*M., de- 
note the inclusion map by j: C — >• T*M and let 9c = j*0 and Ql = j*0. 
Then the following two statements am equivalent. 

(a) Qc = Le., C is an isotropic submanifold of T*M. 

(b) On every simply connected open subset U of C there is a C°° function 
S: U — > R, unique up to an additive constant, such that dS = 9 da- 

Proof Since fl = —d9 and the exterior derivate d commutes with the pull- 
back operation, Oc — 0 is equivalent to d9c = 0. On a simply connected 
subset this equation is satisfied if and only if 9c is the differential of a function, 
please cf. the proof of Proposition 5.5.2. □ 

As a consequence, an n-dimensional submanifold £ of AT is a generalized 
solution of the eikonal equation if and only if the kernel distribution of the 
one-form 9c is locally integrable. 

If we supplement the hypotheses of Proposition 5.5.6 with the assump- 
tion that 9c has no zeros, the isotropy condition Oc = 0 guarantees that 
£ is locally foliated into hypersurfaces 5 = const. If we specialize from the 
isotropic to the Lagrangian case, the situation that 9c has no zeros can be 
characterized with the help of the Euler vector field (5.31) in the following 
way. 

Proposition 5.5.7. Let £ be an embedded Lagrangian submanifold of T*M. 
and u € C. Then the pull-back to £ of the canonical one- form 9 has a zero 
at u if and only if the Euler vector field E is tangent to C at u. 

Proof Let us assume that 9u{Xu) = 0 for all Xu € TuC C Tu{T*M). By 
(5.34) this is equivalent to Qu{EujXu) = 0 for all G T^^£ C Tu{T*M). As 
£ is Lagrangian (i.e., maximally isotropic), this is equivalent to Eu € TuC C 
Tu{T*M). □ 

If a Lagrangian submanifold of T*M is invariant under the flow of the 
Euler vector field, it is called conic (cf. Guckenheimer [52]) or homogeneous 
(cf. Guillemin and Sternberg [55]). Conic Lagrangian submanifolds are of 
relevance as generalized solutions of the eikonal equation for dilation-invariant 
ray-optical structures. They are necessarily non-transverse to the fibers, i.e., 
they cannot be associated with classical solutions of the eikonal equation. 
By Proposition 5.5.7, the pull-back of the canonical one-form 9 to a conic 
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Lagrangian submanifold vanishes identically, i.e., its kernel distribution does 
not give a foliation into smooth hypersurfaces. 

Let us consider, on the other hand, a Lagrangian submanifold C of T*M. 
such that the Euler vector field is nowhere tangent to £. In this case Propo- 
sition 5.5.7 guarantees that the pull-back of ^ to £ has no zeros. As a conse- 
quence of Proposition 5.5.6, the kernel distribution of this one-form defines a 
foliation of C into smooth hypersurfaces. We introduce the following termi- 
nology. 

Definition 5.5.3. Let C be a generalized solution of the eikonal equation 
of a ray-optical structure J\f on At. Assume that the Euler vector field is 
nowhere tangent to C such that the pull-back 6c to M of the canonical one- 
form 6 has no zeros. Then an integral manifold of the kernel distribution of 
9c is called a lifted wave surface of C. 

On each simply connected open subset of £, a lifted wave surface of £ 
can be represented as a surface S = const, where S satisfies dS = 9c. If, in 
the situation of Definition 5.5.3, £ is transverse to the fibers of T*M, the 
projection maps each lifted wave surface onto a smooth hypersurface in 
M which, in agreement with our earlier terminology, is called a wave surface 
associated with £. If £ is not transverse to the fibers, the image of a lifted 
wave surface under the projection need not be a smooth submanifold of 
M and could be called a generalized wave surface. 

We have, thus, generalized our earlier observation that a classical solution 
S ; U — > R of the eikonal equation is associated with a “slicing” of U into 
wave surfaces and a “threading” ofU into rays. A generalized solution £ of the 
eikonal equation is associated with a “slicing” of £ into lifted wave surfaces 
and a “threading” of £ into lifted rays, provided that the Euler vector field 
is nowhere tangent to £. The question of whether lifted rays are transverse 
to lifted wave surfaces is answered in the following proposition. 

Proposition 5.5.8. Let C be a generalized solution of the eikonal equation 
of a ray-optical structure Af on M. Assume that the Euler vector field is 
nowhere tangent to £, i.e., that £ is foliated not only into lifted rays but also 
into lifted wave surfaces. Then for any point u G £ C A/” the following two 
statements are equivalent. 

(a) The lifted ray through u is tangent to the lifted wave surface through u. 

(b) The Euler vector field E is tangent to Af at u. 

Here we speak of “the” lifted ray through u in the sense that this lifted ray is 
unique up to reparametrization and extension. 

Proof Let Xu € TuAf C Tu{T*M) be tangent to the lifted ray through 
u, i.e., let Xu be a vector that spans the characteristic direction at u. By 
Definition 5.5.3, (a) is satisfied if and only if 9u{Xu) = 0. By Proposition 5.4.6, 
this is equivalent to (b). ^ 
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Let us apply this proposition to a ray-optical structure J\f which is eve- 
rywhere transverse to the flow of the Euler vector field, such as given by 
Example 5.1.2, 5.1.4 or 5.1.5. Then the Euler vector field cannot be tangent 
to a generalized solution of the eikonal equation, i.e., it is automatically guar- 
anteed that every generalized solution £ of the eikonal equation is foliated 
into lifted wave surfaces. By Proposition 5.5.8, those lifted wave surfaces are 
always transverse to the lifted rays into which C is foliated. Any real val- 
ued (local) function S on C with dS — 9c gives a (local) parametrization 
on each of those lifted rays. This distinguished parametrization, which is 
unique (globally along the lifted ray) up to an additive constant, was already 
mentioned in Sect. 5.4, see (5.38). 

The situation is completely different for a dilation-invariant ray-optical 
structure N, such as given by Example 5.1.1 or 5,1.3, For a generalized solu- 
tion C of the eikonal equation, the Euler vector field E may or may not be 
tangent to C at any of its points. Only in the case that E is nowhere tan- 
gent to £ is £ foliated into lifted wave surfaces. By Proposition 5.5.8, those 
lifted wave surfaces are then foliated into lifted rays. In other words, any real 
valued function 5 on £ with dS = 9c is constant along each of those lifted 
rays. 

For an arbitrary ray-optical structure J\f on M. these considerations apply 
to the maximal open subset of M on which E is non-tangent to M and to 
the maximal open subset of M on which E is tangent to A/*. A full discus- 
sion requires an appropriate matching procedure in addition. This is rather 
cumbersome and we abstain from working out an example. 



5.6 Caustics 

In the last section we have seen that generalized solutions £ of the eikonal 
equation are foliated into lifted rays. If projected to M those lifted rays give 
a congruence of rays as long as £ is transverse to the fibers oiT*M. At points 
where £ fails to be transverse to the fibers neighboring rays start intersect- 
ing, see Figure 5.6. In optical terminology, this indicates the formation of a 
“caustic” . Therefore we introduce the following mathematical definition. 

Definition 5.6.1. Let £ C T*M be an embedded Lagrangian C°° submani- 
fold of T*M. and denote the restriction to £ of the cotangent bundle projec- 
tion by K = T^\c : £ — > M. Then u ^ C is called a critical point of £ 
iff the tangent map TuK: T^C — > T^{u)M is not surjective. The set 

Caust£ = { k{u) e M\u is a critical point of C} (5.44) 
is called the caustic of £. 

Clearly, the critical points of £ are exactly those points where £ is not 
transverse to the fibers of T*M. In other words, £ is everywhere transverse 
to the fibers if and only if Caust^ = 0. 
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In any case, Caust^ is a set of measure zero in M. This is an imme- 
diate consequence of the well-known Sard theorem which is proven, e.g,, in 
Abraham and Robbin [2], p. 37. In general, Caust£ features cusps, edges 
and vertices, i.e., Caust^ is not a submanifold of M. Thus, the geometry of 
caustics can be very complicated even locally. Quite generally, the variety of 
cusps, edges and vertices possible is so vast that a complete classification is 
not feasible. However, V. Arnold was able to locally classify all caustic types 
which are stable in a certain sense. For the details of this highly technical 
work we refer to Arnold, Gusein-Zade and Varchenko [9]. Arnold’s formal- 
ism was applied to general relativity, e.g., by Friedrich and Stewart [45], by 
Fetters [114], by Hasse, Kriele and Perlick [57], and by Low [87]. 

Here we want, of course, to apply Definition 5.6.1 to the case that £ is a 
generalized solution of the eikonal equation of a ray-optical structure M on 
M. Then C is foliated into lifted rays which can be projected to M to give 
a family of rays. In this situation, Caust£ is the set of all points in M. where 
infinitesimally neighboring rays intersect each other. To put this rigorously 
we introduce the following notation (see Figure 5.7). 

Definition 5.6.2. Let M be a ray-optical structure on M. 

(a) A C°° vector field Z on M is called a field of connecting vectors iff, for 
every characteristic vector field X on M, the Lie bracket [Z,X] is, again, 
characteristic. 

(b) Let I — > M be a lifted ray of M and let J: I — ^ TAf be a C°° map 
with J{s) € T^(s)A/’ for all s € I. J is called a lifted Jacobi field along 
^ iff it can be represented, locally around any parameter value s € I, in 
the form J = Z where Z is a field of connecting vectors. Two lifted 
Jacobi fields along ^ are called equivalent iff they differ by a multiple of 
the tangent field of The respective equivalence classes are called lifted 
Jacobi classes. A lifted Jacobi field is called trivial if it is equivalent to 
the zero vector field, i.e., if it is parallel to the tangent field of 

(c) If J is a lifted Jacobi field along J = Tt^ o J is called a Jacobi field 

along the ray Two Jacobi fields along A are called equivalent 

if they differ by a multiple of the tangent field of X. The respective equiv- 
alence classes are called Jacobi classes. A Jacobi field is called trivial if 
it is equivalent to the zero vector field, i.e., if it is parallel to the tangent 
field of A. 

For a ray-optical structure Af whose rays are geodesics, such as in our 
Examples 5.1.1, 5.1.2 and 5.1.5, Definition 5.6.2 (c) reproduces the standard 
textbook definition of Jacobi fields. (Note, however, that those standard text- 
books usually assume their geodesics to be affinely parametrized whereas our 
rays are arbitrarily parametrized.) 

If J is a lifted Jacobi field along a lifted ray the “arrow-head” of J can 
be thought as tracing a neighboring lifted ray which is infinitesimally close 
to All members of a lifted Jacobi class trace the same neighboring lifted 
ray. 
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Pig. 5.7. A lifted Jacobi field J connects a lifted ray ^ with a neighboring lifted 
ray; a Jacobi field J connects a ray A with a neighboring ray. 



To construct a lifted Jacobi field along a lifted ray [si,S 2 ] — ^ M we 
consider a variation of i.e., a C°° map rj: ] — £o>£o[ x [si)S 2 ] — 
such that ?/(£, •) is a lifted ray for all £ G ] — £o»£o[ and 77 ( 0 , • ) = Then 
differentiation with respect to the variational parameter s at e = 0 gives a 
lifted Jacobi field along J{s) = 77(-,s)'|e=o- In a natural chart, denoting 
the derivative with respect to the variational parameter by 5 such that 

J = + (5,45) 

a lifted Jacobi field is determined by the set of equations 

j(i?(x,p)) = 0, (5.46) 

(5-47) 

^ ® 

where H is any (local) Hamiltonian for the ray-optical structure considered. 

(5.46) , (5.47) and (5.48) are, of course, just the conditions that the ray equa- 
tions (5.10), (5.11) and (5.12) are to be preserved. The J-derivatives in (5.46), 

(5.47) and (5.48) can be evaluated with the help of the usual product and 
chain rules. As derivatives with respect to the variational parameter and with 
respect to the curve parameter commute, (5.47) and (5.48) give us a system 
of first order linear differential equations for Spa and Any solution of 
this system, with any Sk, that satisfies (5.46) gives us a lifted Jacobi field. 
It is easy to verify the following fact. To any initial values Ja;“(si), Spa{si) 
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that satisfy (5.46) there is a solution 5x^, 5pa of the full system (5.46), (5.47) 
and (5.48), and it is unique up to adding a multiple of the tangent field. This 
freedom corresponds to the freedom of choosing Sk at will. 

T his observation proves the following result which can be verified directly 
from Definition 5.6.2 as well, without refering to the coordinate representation 
(5.46), (5.47) and (5.48). 

Proposition 5.6.1. The set of all lifted Jacobi fields along a lifted ray ^ is 
an infinite dimensional real vector space. The set of lifted Jacobi classes along 
^ is a (2n — 2) -dimensional real vector space, corresponding to the (2n — 2) 
directions transverse to the characteristic direction in ff. 



As a consequence, the set of Jacobi fields along a ray A is an infinite 
dimensional real vector space. The set of Jacobi classes along A is a finite 
dimensional real vector space of dimension < (2n — 2). Since there is a ray 
through each point of M, the dimension cannot be smaller than (n — 1). 
This minimal value is realized, e.g., in Examples 5.1.3 and 5.1.4. We shall 
now demonstrate that the maximal value is realized in strongly regular ray- 
optical structures. (Please recall Definition 5.2.2.) The proof will be based on 
the observation that in the strongly regular case Jacobi classes are determined 
by a second order linear differential equation that admits an existence and 
uniqueness theorem; so the dimension of the space of Jacobi classes can be 
found out by counting the allowed values for the inditial data. 

Proposition 5.6.2. Let M be a strongly regular ray-optical structure on M 
and A: [si,S 2 ] — ^ M be a ray of M. Choose an arbitrary affine connection 
V on M. Then for any two vectors X and Y in Tx{si)M there is a Jacobi 
field J along A with J(si) = X and Vj^(si)J = Y. This Jacobi field is unique 
up to transformations J i — > J -\-wX with w{si) =0. As a consequence, for a 
strongly regular ray-optical structure the vector space of Jacobi classes along 
any ray A has dimension (2n-2), corresponding to the (n-1) components of 
X and the (n- 1) components of Y transverse to the tangent vector A(si). 

Proof Let ^ : [si,S 2 ] — ^ A/" be a lifted ray that projects onto A. First we 
give the proof of the proposition under the additional assumption that ^ can 
be covered by the domain of a Hamiltonian and by a natural chart. Then our 
assumption of strong regularity guarantees that, in the natural chart chosen, 
condition (5.15) holds along we can thus introduce the inverse matrix by 



/(Gca) {Gc)\ ^ f 1 

V(<3.) G J [(H”) 0 J [o I J 



(5.49) 



where the components C?ca> Ga and G are to be viewed as functions of the 
curve parameter s. Please recall that, in terms of their coordinate representa- 
tion (5.45), lifted Jacobi fields along ^ are determined by (5.46), (5.47), and 
(5.48). It is our goal to eliminate 5k and 5pa from these equations and to get 




104 5. Ray-optical structures on arbitrary manifolds 



a second order differential equation for 5x°' alone, i.e., to get an equation for 
Jacobi fields rather than for lifted Jacobi fields. To that end we observe that 
(5.46) and (5.47) can be written as a matrix equation in the following way. 



0 ] I f j - I 



a‘H 

dpadx^ 

dH f „o 
dx°- 






With the help of (5.49), (5.50) can be solved for 5pb and 5k, 




(G.a) (Go) 
(Ga) G 



'15±“ 



- ^ H 

dpadx^’ 






(5.50) 



(5.51) 



With the help of (5.50), we may eliminate Spc and Sk from (5.48) which gives 
an equation of the form 



Gab Sx^ + Bab Sx^ + Oab Sx^ = 0 . (5.52) 



Here Gab has the same meaning as in (5.49) and (5.50) whereas Bab and Cab 
are some functions of s the special form of which will be of no interest in the 
following. By construction, J = Sx^-^ is a Jacobi field if and only if the Sx^ 
satisfy (5.52). From (5.49) we read that 



= 0 . (5.53) 

Thus, for each parameter value s the matrix (Gca{s)) has a non-trivial kernel 
which is spanned by the tangent vector of the ray, so (5.52) cannot be solved 
for the second derivatives. This reflects the fact that initial values Ja;“(si) 
and 5i;“(si) do not fix a solution of (5.52) uniquely but leave the freedom 
of adding multiples of the tangent field. At each parameter value s we may 
introduce the (n — l)-dimensional vector space 

L{s) = { (z^) G R” I Ga{s) 2“ = 0 } (5.54) 

which is transverse to the tangent vector x“(s) = k{s) i?“(s) since, by (5.49), 

GaH^ = 1 . (5.55) 

Prom (5.49) we read that for all {z"') in L{s) the equation 

H^^{s)Gac{s)z^ = (5.56) 

holds true, i.e., that on L{s) the matrix (Gac{s)) is invertible, with (i?^“(s)) 
being its inverse. If we restrict to Jacobi fields with 

(Jx“(s)) € L{s) for all s, (5.57) 

then (5.52) gives us a second order differential equation for that admits 
an existence and uniqueness theorem. In order to prove this it is convenient 
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to choose the coordinates in such a way that H°' — 5“ and Ga = along 
A, which is possible owing to (5.55). With this choice of coordinates (5.57) 
implies that (5a;“(s)) 6 L{s) and (5^“(s)) e L{s) for all s. As a consequence, 
multiplying (5.52) with results in 

5x^ + if"® Bab + if"“ Cab 5x^ = 0. (5.58) 

Giving initial values J(si) = X and F is equivalent to giving initial 

values (5 £c“(si) and 5rc“(si). By adding appropriate multiples of x“(si) we get 
initial values in L{si) which determine a unique solution Sx^ of (5.58). Then 
Sx^ 4- fx^, with f{si) chosen appropriately, is a Jacobi field that satisfies the 
original initial conditions. Except for its value at the initial point, / can be 
chosen at will. This completes the proof under the assumption that A can be 
covered by a chart of the desired form. 

In the general case we divide the domain of A into subintervals such that 
the restriction of A to each subinterval can be covered by a local chart in which 
the equations = J® and Ga = 5“ hold along A. Then we get the desired 
Jacobi fields by solving (5.58) piecewise, where on each subinterval the initial 
values are determined by the end values on the preceding subinterval. □ 

Quite generally, Jacobi fields and lifted Jacobi fields can be used to 
characterize caustics in the following way. Let £ be a generalized solu- 
tion of the eikonal equation of a ray-optical structure U on M and let 

[si,S 2 ] — ^ £ C A7 be a lifted ray through the point u — ^{$ 2 ) € £. 
Then w is a critical point of £, in the sense of Definition 5.6.1, if and only 
if there is a non-zero vertical vector Zu € T^£. By the above argument, the 
existence of such a vector Zy, is equivalent to the existence of a non-trivial 
lifted Jacobi field J along ^ such that J is everywhere tangent to £ and J{s 2 ) 
is vertical. (Please note that J is everywhere tangent to £ if J{s 2 ) is tangent 
to £, owing to the Lagrange property of £.) Verticality of J(s 2 ) indicates 
an intersection of the ray ° ^ with the “infinitesimally neighboring ray” 
J = Ttj^ o j. In this sense, Caust^ is the set of all points where infinites- 
imally neighboring members of the family of rays determined by £ have an 
intersection. Note that, in general, J may be zero on a whole interval, i.e., 
the two neighboring rays may coincide on a whole interval. To exclude this 
unwanted situation one introduces the following definition. 

Definition 5.6.3. Let X: I — > N be a ray of a ray-optical structure M on 
Ad and fix two different parameter values si, S 2 € /. Let Jac(A, si, S 2 ) denote 
the vector space of Jacobi classes [ J] along A such that there is a J £ [J] with 
J(si) = 0 and J{s 2 ) = 0. Then the point \{s 2 ) is called conjugate to A(si) 
along A iff the dimension 0 / Jac(A,si,S 2 ) is non-zero and this dimension is 
called the multiplicity of the conjugate point. 

For the ray-optical structures of Example 5.1.5, Definition 5.6.3 coincides 
with the standard textbook definition of conjugate points in Riemannian 
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geometry. For the ray-optical structures of Example 5.1.1 (and 5.1,2, respec- 
tively) it coincides with the definition of light-like (and time-like, respectively) 
conjugate points in Lorentzian geometry, cf. Beem, Ehrlich, and Easley [11]. 
The ray-optical structures of Example 5.1.3 and 5.1.4 do not admit any con- 
jugate points. 

Later we shall come back to the notion of Jacobi fields and of conjugate 
points. In particular, we shall explicitly evaluate the differential equation for 
Jacobi fields of isotropic ray-optical structures on Lorentzian manifolds in 
Sect. 6,4, and we shall use the notion of conjugate points and its multiplicity 
to develop a Morse theory for rays of strongly (hyper-)regular ray-optical 
structures in Sect. 7.5. In the latter context, the following observation will 
be important. 

Proposition 5.6.3. Let X : I i — > M be a ray of a strongly regular ray- 
optical structure M and fix a parameter value s\ G I. Then the following 
holds true. 

(a) If A(s) is conjugate to A(si) along A, its multiplicity cannot be bigger 
than (n — 1). 

(b) There is an £> 0 such that for 0 < \s — si\ < e the point A(s) cannot be 
conjugate to A(si) along A, 

(c) Let ^ be a lifted ray that projects onto A and assume that, with respect to 
any local Hamiltonian and any natural chart, the matrix in (5.15) is not 
only non-degenerate but even positive definite at all points of If \{s 2 ) 
is conjugate to A(si), then there is an e > 0 such that for 0 < |s — S 2 I < ^ 
the point A(s) cannot be conjugate to A(si) along A, 

Proof. By Proposition 5.6.2, the vector space of Jacobi classes that vanish at 
a particular point has dimension (n — 1). Hence, the vector space of Jacobi 
classes that vanish at two points cannot be bigger than (n — 1). This proves 
(a). 

To prove (b), we consider the set of all Jacobi fields J that vanish at 
s\. By Proposition 5.6.2, the derivatives J of those Jacobi fields span 

an (n — l)-dimensional vector space transverse to A(si), where V denotes 
any affine connection on M. By Taylor’s theorem, this implies that for 0 < 
|s — si| < £ the values J{s) of those Jacobi fields span an (n — l)-dimensional 
vector space transverse to A(s). This proves (b). 

We now turn to the proof of (c) which is more difficult. If the point A(s 2 ) 
is conjugate to A(si) along A, part (a) implies that the multiplicity m of this 
conjugate point has to satisfy the inequality m < n — 1. In this situation we 
can find Jacobi fields Ji, . . . , Jn-i along A such that 

- the vector fields Ji, . . . , Jn-i? A are linearly independent over K ; 

— tTi(si) = • • • = t/ji_i(si) = 0 , 

— ^^ 1 ( 52 ) = • • • = Jm{^2') ~ 0 j 

- the vectors Jm+i{s 2 ), • • • , Jn-i{s 2 ),K^ 2 ) are linearly independent. 
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A vector field along A that vanishes at si is a Jacobi field if and only if it 
differs from a linear combination of Ji, . . . , Jn_i with constant coefficients by 
a multiple of A. Here we made use of Proposition 5.6.2. 

Now we give the proof of (c) under the additional assumption that the 
lifted ray Cl [s 1 , 33 ] is contained in the domain of a natural chart {x,p) and of 
a Hamiltonian H for A/”. For A = 1 , . . . , (n - 1), our Jacobi fields are then 
represented in the form 



Ja - Sax“ 


(5.59) 


J>[ic“(si) = 0 for A = 1 ,. . . , (n- 1) ; 


(5.60) 


5ix°'{s2) = 0 for / = 1 , . . . , m ; 


(5.61) 



(<J^+io;“(s 2 )) , • • • , (<5n_ia:“(s2)) , (®“(s 2 )) are linearly independent . (5.62) 

We are still free to change the Jacobi fields by a transformation of the form 
I — )• Ja + fA^ with functions fA that satisfy /a(si) = 0 for all indices 
A = 1, . . . , (n - 1) and //(S 2 ) = 0 for all indices / = 1, . . . , m . As shown in 
the proof of Proposition 5.6.2, we may use this freedom in such a way that, in 
an appropriately chosen chart, the second order differential equation (5.58) 
is satisfied by 5x°‘ = 5ax^ for all A = 1, . . . , (n — 1). Then (5.61) implies that 

(Jii:“(s 2 )) , . . . , (SmX^(s 2 )) are linearly independent. (5.63) 

Otherwise there would be a non-zero solution Ja;“(s) = c^Jia:“(s) -f • • • -f 
c”^SmX^(s) of the linear differential equation (5.58) with 5x“(s2) = 0 and 
6 x^(s 2 ) = 0 , which is impossible. 

In the following we have to use the positive-definiteness assumption of 
(c). This assumption implies that, along C, we have (5.49) at our disposal, 
with both matrices on the left-hand side positive definite. (Here we make use 
of the elementary fact that the inverse of a positive definite matrix is, again, 
positive definite.) For A = 1, . . . , (n - 1), the are the components of a 
Jacobi field. Hence, inserting = Sax^ into (5.51) determines Spb = SaPb 
and 6 k = fo such a way that (5.46), (5.47), and (5.48) are satisfied, i.e., 
such that 

= + (5.64) 

is a lifted Jacobi field along C for each A = 1, . . . , (n - 1). With the help of 
(5.47) and (5.48) it is readily verified that 

{SjPa 6 ix°' - 6 ipa Sjx^-y = 0 



(5.65) 
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for any two indices I and J between 1 and (n - 1). By (5.60), this implies 
that 



8jpa - 5ipa = 0 . (5.66) 

Evaluating this equation at the parameter value S 2 , and using (5.61), we find 

ihPaSjx°')(s 2 ) = 0 for 1 < / < m < J < (n - 1) . (5.67) 

As Sjpa is determined by (5.52), with the index I affixed to all variational 
derivatives, (5.67) can be rewritten in matrix form as 

for 1 < / < m < J < (n — 1). Moreover, (5.53) and (5.55) imply that 

(5.68) and (5.69) demonstrate that, with respect to a positive definite matrix, 
the space spanned by 



(5ii:“(s2)) , . . • , (5m^“(s2)) (5.70) 

is orthogonal to the space spanned by 

((J^+ia;“(s2)) , . . . , (5n-iic“(s2)) , (^“(S2)) • (5.71) 

By (5.62) and (5.63), this implies that the vectors 

(<5iX“(s 2)),. (l5mX“(s2)), (<Sm+ia:“(s2)) , • • ■ , (^a-ix‘*(52)), (x“(s2)) 

(5.72) 



are linearly independent. 
Now we define 



5/y“(s) = 



Sjx^(s) if s = §2 



for / = 1, . . . ,m. 



(5.73) 



Sjy°’{s) = Sjx°'{s) for J = m + 1, . . . , (n - 1) . 

The Bernoulli-rHopital rule guarantees that not only the Sjy^ but also the 
Sjy^ are continuous functions of the parameter s. We have just proven that 
{hy°'{s)), • . • , (<5n-i2/“(s)), (^“(s)) are linearly independent for s = S 2 - By 
continuity, the same must be true for s ^ S 2 as long as |s — S 2 I is suffi- 
ciently small. As a consequence, we can read from (5.73) that the vectors 
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(^irr“(s)), . . . , (5n-ia^“(s)), (^“(s)) are linearly independent for s ^ S 2 and 
js — S 2 I sufficiently small, i.e., that at those points the Jacobi fields span an 
(n — l)-dimensional space transverse to the tangent vector. This completes 
the proof for the case that ^ can be covered by the domain of an appropriate 
natural chart and a Hamiltonian. 

In the general case one has to use a patching of several charts and to 
evaluate all relevant differential equations piecewise, □ 



It is important to realize that part (c) of this proposition is not true with- 
out the positive-definiteness assumption. A counter-example can be found in 
an article by Heifer [62] , What Heifer constructs is a space-like geodesic in a 
Lorentzian manifold along which a whole interval is conjugate to some point. 
This example can be translated into our terminology by considering a Hamil- 
tonian of the form H{x,p) = ^{g°'^{x)paPb - l)» with the contravariant 
components of a Lorentzian metric. Read in this way, Heifer’s construction 
gives a ray of a strongly regular ray-optical structure along which a whole 
interval is conjugate to some point. It should be noted that Heifer’s metric 
is of class C°° but not analytic. As a matter of fact, it is not difficult to ver- 
ify that for analytic Hamiltonians part (c) of Proposition 5.6.3 is true even 
without the positive-definiteness assumption. In other words, in the analytic 
category it is true that conjugate points are isolated along every ray of a 
strongly ray-optical structure. To prove this, it suffices to observe that, in 
the notation used in the proof of Proposition 5.6.3, the determinant 



D{s) = det 



f Six^{s) 






5n_ia;”(s) x^{s)\ 
6n-ix^{s) x^{s) ) 



(5.74) 



must be analytic if the Hamiltonian is analytic. Hence, if D does not vanish 
identically, then its zeros must be isolated. We shall come back to Proposi- 
tion 5.6.3 in Sect. 7.5 below. 

We end this section with a short remark on the characterization of caustics 
in terms of (lifted) wave surfaces, rather than in terms of (lifted) rays. To 
that end we consider a generalized solution C of the eikonal equation of a 
ray-optical structure A7 on A4 and we assume that the Euler vector field E is 
nowhere tangent to C. By Definition 5.5.3, only in this case is the notion of 
lifted wave surfaces defined. By Definition 5.6.1, u € >C is a critical point of C 
if and only if there is a non-zero vertical vector Zu € TuC. Since all vertical 
vectors are in the kernel of the canonical one-form 0, this vector must be 
tangent to the lifted wave surface S through u. Hence, the existence of such 
a vector Zu indicates that the restriction to S of the projection cannot 
be an immersion at w, i.e., that is not a codimension-one submanifold 

of M near rX^iu). In other words, Caust£ is the set of all points where 
the generalized wave surfaces associated with C fail to be codimension-one 
submanifolds. 




6. Ray-optical structures 
on Lorentzian manifolds 



In Chap. 5 we have established the notion of a ray-optical structure on a 
bare manifold M.. From now on we shall assume that there is a Lorentzian 
metric g given on Ad. This metric is to be interpreted as a spacetime metric 
in the sense of general relativity, although for the mathematical formalism it 
will not be necessary to specialize to the case dim(Ad) = 4. We shall assume, 
however, that dim(Ad) > 2 to exclude some pathologies. 



6.1 The vacuum ray-optical structure 

The metric g determines a distinguished ray-optical structure 

AT^ = { u € T^M 1 g*{u, u) = 0 } (6.1) 

on Ad, just by the construction of Example 5.1.1 with go — g. We shall refer 
to as to the vacuum ray-optical structure on (A4,g). The rays of A/"® are 
the ^f-light-like geodesics which are to be interpreted as vacuum light rays 
according to general relativity. It is important to realize that any conformally 
equivalent metric g (i.e., any metric of the form g = g with some C°° 
function /: Ad — > M) determines the same vacuum ray-optical structure 
as g. Up to conformal equivalence, determines g uniquely. Hence, the 
causal structure of the Lorentzian manifold (M,g) is completely coded in 
the ray-optical structure 

At each point q e M, nT*Ad consists of two connected components. 
(Here our assumption dim(Ad) > 2 is essential.) Thus, Af^ has either one or 
two connected components. The Lorentzian manifold {A4,g) is called time- 
orientahle iff Af^ has two connected components, cf., e.g., Sachs and Wu 
[126], p. 24, or Wald [146], p. 189. In this case, each connected component of 
Af^ may be viewed as a ray-optical structure in its own right. One of them 
gives rays with future-pointing momenta whereas the other one gives rays 
with past-pointing momenta. If (A4,g) is not time-orientable, i.e., if A^^ has 
only one connected component, no such distinction can be made in a globally 
consistent way. 

Ray-optical structures AT on Ad which are diifferent from Af^ are to be 
interpreted as giving light propagation in a medium. If Af is dilation-invariant 
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in the sense of Definition 5.4.1, the medium is called non- dispersive, otherwise 
it is called dispersive. The following proposition characterizes the vacuum 
ray-optical structure in comparison to other ray-optical structures on 
M. 

Proposition 6.1.1. Let H be a ray-optical structure on M. Assume that 
Af is regular at all points u £ Af in the sense of Definition 5.2.1 and that all 
the rays of Af are g-light-like. Then Af C Af^. [That is to say, if (AA,g) is 
not time-orientahle, Af = Af^; if (A4,g) is time-orientable, either Af = Af^ 
or Af is one of the two connected components of Af^.) 

Proof. Fix a point q £ AA. Since all the rays of the ray-optical structure Af 
are ^-light-like, Afq =Nc\TqM is a light-like hypersurface of the Minkowski 
space (T*M,gf). By elementary Minkowski geometry, this implies that Afq 

O 

is ruled by light-like straight lines. Since Afq is closed in T*AA, any such line 
either runs from infinity to infinity or it runs from the origin to infinity. We 
shall prove that the first case is impossible. By contradiction, let us assume 
that there is a light-like straight line L in Afq that runs from infinity to infinity. 
This means that all light-like straight lines in Afq which are infinitesimally 
close to L also have to run from infinity to infinity, without intersecting L. 
We decompose the motion of those neighboring lines in the familiar way into 
rotation, shear and expansion. Since the lines are surface forming, the rotation 
vanishes. Since the neighboring lines have no intersection with L, shear and 
expansion also vanish. (Non-vanishing shear gives an intersection with L of 
those neighboring lines that lie in the principal shear directions. In the case 
of vanishing shear, non-vanishing expansion gives an intersection with L of 
all neighboring lines.) Hence, all the neighboring lines have to be parallel to 
L. It is easy to verify that this conclusion contradicts our assumption that 
Af is everywhere regular. Thus, we have proven that Afq is ruled by light-like 
straight lines running from the origin to infinity. As a consequence, Afq must 
be a subset of Af^ = Af^ n T* AA. □ 

This proposition can be rephrased in the following way. As long as regu- 
larity is not violated, the velocity of fight in a medium is necessarily different 
from the vacuum velocity of fight, at least for some rays. To demonstrate 
that the regularity assumption is, indeed, necessary one can consider Exam- 
ple 5,1.3 with a light-like vector field U . 

For an arbitrary ray-optical structure on our Lorentzian manifold the rays 
can be time-like, light-like or space-like; they can even change their causal 
character from point to point. If we assume that rays can be used to transmit 
signals (and this, after all, is a basic idea of ray optics), the rules of general 
relativity prohibit space-like rays. We use the following terminology. 

Definition 6.1.1. A ray-optical structure Af on AA is called causal with re- 
spect to g if all rays of Af are everywhere g-time-like or g-light-like. 
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The ray-optical structures of Examples 5.1.1 and 5.1.2 are causal with 
respect to g iff the metric go is “narrower” than g, i.e., iff the inequality 
go{X,X) < 0 implies the inequality g{X,X) < 0. The ray-optical structures 
of Examples 5.1.3 and 5.1.4 are causal with respect to g iff the vector field U 
satisfies the inequality g{U, U) < 0. 

The question of whether or not all physically reasonable ray-optical struc- 
tures on a spacetime have to be causal is a bit subtle. If ray optics is viewed 
as an approximation scheme to wave optics, the energy of a wave field does 
not propagate exactly along rays; this is true only in an approximative sense. 
(We have discussed this issue in Sect. 2.7, at least for a special class of me- 
dia.) On the basis of this observation, non-causal rays are not necessarily to 
be discarded altogether as unphysical. In the next section we shall introduce 
the notions of phase velocity and group velocity for a ray-optical structure 
A/*, and we shall see that M is causal iff the group velocity does not exceed the 
vacuum velocity of light. It is well known that there are physically relevant 
optical media in which the group velocity exceeds the vacuum velocity of 
light. For a comprehensive discussion of this issue we refer to Brillouin [22]. 



6.2 Observer fields, frequency, and redshift 

Several basic concepts of ray optics which are familiar from elementary text- 
books depend on the notion of frequency. For a ray-optical structure on our 
Lorentzian manifold (M,g) this notion can be introduced after a time-like 
vector field has been chosen. 

A time-like C°° vector field, given on some open subset U oi M. will be 
called a (local) observer field henceforth. In the case U = M speak of 
a global observer field. This terminology refers to the fact that the integral 
curves of such a vector field can be interpreted as the worldlines of observers. 
A global observer field exists if and only if {M,g) is time-orientable, see, 
e.g., Wald [146], Lemma 8.1.1. In the following we assume that we have 
an observer field V given on some open subset U of M which satisfies the 
normalization condition g{V,V) = -1. This normalization condition means 
that the integral curves of V are parametrized by proper time. 

At each point q e U C v/e write Vq for the value at q of the time- 
like vector field V. We decompose the tangent space TqM into the one- 
dimensional time-like subspace spanned by Vq and its (n — 1) -dimensional 
orthocomplement 

HqM = {Y e TqM \ gq{Vq, y) = 0 } . (6.2) 

Similarly, we decompose the cotangent space T*M into the one-dimensional 
subspace spanned by the covector gq{Vq, •) and its (n — l)-dimensional or- 
thocomplement 



H;M = {u € T*M I u{Vq) = 0} . 



(6,3) 
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As suggested by our notation, H*M can be identified with the dual space of 
HqM. 

Now let A/* be a ray-optical structure on A4. To each point u € Af vre 
assign the following quantities, denoting the footpoint of u by g 
The frequency 



(jj{u) = —u{Vq) e R ; 

the spatial wave covector 

k{u) = u - w{u) g{Vq , . ) e H;M ; 

the phase velocity 
the ray velocity or group velocity 

^ /'T/ ^ HqAd . 



gqiVq,iWH)(u)) 



(6.4) 

(6.5) 

( 6 . 6 ) 
(6.7) 



Here || • jj denotes the norm induced on H*M by our Lorentzian metric and 
¥H denotes the fiber derivative of a local Hamiltonian H for Af. 

In a natural chart these definitions take the following form. 



u{x,p) = -PaV^'ix ) , 
ka{x,p) =Pa~ 0}{x,p) gab(x) V^{x) , 



Wa{x,p) = 



u“(x,p) = 



k (x v) 

gbc(^x)h{x,p)kc{x,p) 

— §^(x,p) 

9cd{x)V<^{x)§^{x,p) 



( 6 . 8 ) 

(6.9) 

( 6 . 10 ) 

( 6 . 11 ) 



If we evaluate ( 6 . 8 ) and (6.9) along a classical solution S : U — R of the 
eikonal equation of Af, we reproduce the frequency function (2.38) and the 
spatial wave covector field (2.39), respectively. (The parameter a can be 
absorbed by a redefinition of the eikonal function S.) This justifies the ter- 
minology. The name “phase velocity” for w{u) refers, of course, to the same 
situation. At each point u e dS{U), the covector w{u) determines the spatial 
velocity of the wave surfaces (= “phase surfaces”) with respect to the ob- 
server field V. Geometrically, the norm of w{u) is a measure for the pseudo- 
Euclidean angle 7 between the covector gq{Vq, • ) and the covector u according 

to the formula sinh ^7 = (l - ||u;(i 6 )lP)’’^, as can be read from equation ( 6 . 6 ). 

The ray velocity (6.7), on the other hand, admits an obvious physical 
interpretation if evaluated along a lifted ray. The direction of the vector u(u) 
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determines the spatial direction in which the ray is moving, viewed with the 
eyes of an observer traveling along an integral curve of V ; the norm of v{u) is a 
measure for the pseudo-Euclidean angle between the tangent vector to the ray 
and the tangent vector to the observer’s worldline, i.e., for the relative velocity 
of the ray with respect to the observer field chosen. This justifies the term 
“ray velocity” . To verify that, moreover, (6.7) is equivalent to the familiar 
textbook definition of the “group velocity” , we proceed in the following way. 
First we have to assume that gq{Vq,¥H{u)) 0 to make sure that v{u), as 
defined by (6.7), is non-singular. This condition is satisfied if and only if at 
the point u the straight line parallel to gq{Vq, • ) in T*A4 is transverse to Afq = 

DTqM. (For causal ray-optical structures this transversality condition is 
automatically satisfied.) Clearly, this is the case if and only if the manifold 
Afq C T* M ^ H*M X R is the graph of a function f\H*M — > M locally 
around u. (Globally, however, A/*q need not be the graph of a single-valued 
function. In typical cases, such as in our Examples 5.1.1, 5.1.2, and 5.1.4, Mq 
has several “branches”.) Then a quick calculation shows that v{u), as defined 
by equation (6.7), is equal to the differential {df)u € = HqM.. In 

other words, to calculate v{u) we have to write the frequency as a function 
of the spatial wave covector by means of the dispersion relation and we have 
to calculate the gradient of this function. This is exactly the usual textbook 
definition of the group velocity. 

By (6.6), the phase velocity has a zero at points u e M where the fre- 
quency vanishes, and it has a singularity at points u £ M where the spatial 
wave covector vanishes. Either case is to be viewed as a pathological behav- 
ior and indicates a “bad” choice of observer. (If no other choice is possible, 
the ray-optical structure is to be viewed as “bad”.) Similarly, we can read 
from (6.7) that the ray velocity has a zero at points u e Af where ¥H{u) is 
parallel to our observer field and that it has a singularity at points u ^ Af 
where ¥H{u) is orthogonal to our observer field. The first case indicates a 
“bad” choice of observer, whereas the second case cannot happen if our ray- 
optical structure is causal in the sense of Definition 6.1.1. As a matter of 
fact, a ray-optical structure is causal if and only if l|u(u)|l < 1 for all u eAf. 
Here || • (j denotes the norm induced by our Lorentzian metric on the vector 
space HqM. at the point q — t^{u). In other words, a ray-optical structure 
is causal if and only if the ray velocity ( = group velocity) is bounded by the 
vacuum velocity of light. 

Note that basically the phase velocity is a spatial covector whereas the 
ray velocity is a spatial vector. We can, of course, use the metric g to identify 
vectors and covectors. In particular, we can use the metric to make the spatial 
wave covector into a vector, i.e., we can introduce, in terms of a natural chart, 
the quantity 

fc“(x,p) = g^'^ix) h{x,p) . (6.12) 

This vector is sometimes called the vector of normal slowness, following Sir 
William R. Hamilton. Here, “normal” refers to the fact that, along any clas- 
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sical solution of the eikonal equation, this vector is orthogonal to all spatial 
vectors which are tangent to the wave surfaces; “slowness” refers to the fact 
that the phase velocity decreases if the length of this vector increases. 

The properties of a specific ray-optical structure are nicely visualized by 
indicating, for a fixed frequency, the phase velocity and the group velocity for 
each spatial direction. To make this idea precise we fix a ray-optical structure 
AT, a normalized observer field V, a point q e M, and a real number a»o € R. 
Then we define the sets 

= {w(u) \ueUq, uj{u) =LJo}C H;M C T*M , (6.13) 

3 = { v{u) I U € Mq, (Jj{u) =U)o} C HqM. C TqM. . (6-14) 

Qf* is usually called the figuratrix of the medium whereas 9 is called the 
indicatrix. Both names originate from variational calculus. 

In general, figuratrix and indicatrix depend on the frequency value a>o 5 
i.e., if we switch from Uq to cug with some real number c, figuratrix and 
indicatrix undergo a deformation. If we restrict ourselves to the case c > 0 
(i.e., if we fix the sign of the frequency), there is no such deformation, for 
any observer field and for any point q e M, if and only if our ray-optical 
structure is dilation invariant. In other words, the dilation invariant case 
is characterized by the property that phase velocity and group velocity are 
independent of the frequency, as long as the sign of the frequency is fixed. This 
is the defining property of a non-dispersive medium according to standard 
textbooks on optics. We have thus justified our earlier claim that for a ray- 
optical structure on a Lorentzian manifold the attributes “dilation invariant” 
and “non-dispersive” are synonymous. 

As an example, we consider a Hamiltonian of the form 

H{x, p) = I {g°'^{x)paPb + h{x)) (6.15) 

in natural coordinates, which comprises Example 5.1.1 {h{x) = 0 and 
Po^ix) = g^^{x)) and Example 5.1.2 (h{x) 0 and p“^(x) = g°'^{x)/h{x)). 

Then the figuratrix at the point with coordinates x is a sphere of radius 
- h{x)) around the origin in H*M, whereas the indicatrix is a sphere 
of radius (wj - h{x)) /uj^ around the origin in HqM. 

For a pathological ray-optical structure of the type given in Example 5.1.3, 
on the other hand, the figuratrix is a sphere through the origin and the 
indicatrix is a single point. 

Now we turn to the question of how the frequency changes along a ray. In 
other words, we want to discuss the general relativistic redshift (or blueshift) 
for light propagation in media. Along any curve in A/”, given in terms of a 
natural chart as a map s \ — > (x(s),p(s)), differentiation of the frequency 
function (6.8) yields 

£u,{x{s),p{s)) = -Pais) V^{x{s)) -Pais) - • (6.16) 
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If we introduce the canonical lift V of the observer field V according to (5.24), 
equation (6.16) can be rewritten in coordinate-free form as 

£uj{x{s),p{s)) = % 3 )(^(s),y^(s)) (6.17) 

for any C°° curve I — > A/". Since A/" is transverse to the fibers of T*A4, 
V can be decomposed, at each point of A/", into a vector tangent to N" and a 
vector tangent to the fiber. (This decomposition is, of course, not unique.) If 
^ is a lifted ray, only the component tangent to the fiber gives a contribution 
to (6.17). Therefore, it is justified to say that a C°° curve ^ : I — > Af satisfies 
the redshift law of lifted rays with respect to the observer field V if 

%.)(fW.i3(s)) =0 (6.18) 

for all C°° maps Q: I — . TV with Q{s) € and TrJ^[Q{s)) = 

(^(»)) ® ^ redshift law for lifted rays can be expressed more 

conveniently if we use a Hamiltonian H for A/. Then a lifted ray ^ : I — Af 
with a parametrization adapted to H (i.e., such that (5.9) holds with k = 1) 
satisfies, by (6.17), the redshift law 

^u,(rr(s),p(s)) = . (6.19) 

To illustrate the redshift law with an example, we consider a ray-optical 
structure that is generated by a Hamiltonian of the form 

H{x, p) = I gf{x) PaPb-^c (6.20) 

where are the contravariant components of a Lorentzian metric Qq and 
c is a real constant. Our Examples 5.1.1 and 5.1.2 are of this form. A lifted 
ray of such a ray-optical structure gives a geodesic of the metric go if pro- 
jected to A4. The parametrization is adapted to H if this geodesic is affinely 
parametrized and if, in addition, 

Pa{s) = (go)ab(x(s)) X^{s) . (6.21) 

In this situation, the frequency with respect to a normalized observer field V 
is given by 



o;(a;(s),p(s)) = -{go)ab(x{s)) x^{s) V“(®(s)) (6.22) 

along the lifted ray. 

If we switch to coordinate-free notation, denoting the lifted ray by ^ > 
Af and its projection to Ad by A = o (6.22) implies 

^(^(^ 2 )) ^ {9o)x{32){^M,Vx{s2)) /g 23 ) 

‘*^(^(^ 1 )) i9o)x{si){Hsi),Vx(si)) 
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for any two parameter values si, S 2 6 I. Please note that ^ enters into the 
right-hand side of (6.23) only in terms of its projection A. As (6.23) remains 
true after an affine reparametrization, it gives the redshift along any ray 
which is parametrized affinely (as a geodesic of 5 ^ 0 )- 

The redshift formula (6.23) applies, in particular, to the vacuum ray- 
optical structure, where we have to read go = g. In this special context equa- 
tion (6.23) is well known. It was found by Kermack, McCrea and Whittacker 
[71] and rediscovered by Schrddinger [129]. A particularly clear derivation was 
given by Brill [20]. As an alternative, the vacuum redshift formula can also 
be expressed in terms of acceleration, expansion and shear of the observer 
field V. Details can be found in articles by Ehlers [37], by Basse and Perlick 
[58] and by Perlick [107]. 

Now we turn back to arbitrary ray-optical structures. The following propo- 
sition characterizes the redshift-free case. 

Proposition 6.2.1. Consider a ray-optical structure J\f and a global ob- 
server field V with g{V^V) = —1 on M. Then the following two properties 
are equivalent. 

(a) The frequency with respect to V is constant along each lifted ray ^ of M. 

(b) V {Please recall Definition 5.3.2 of the symmetry algebra Qu-) 

Proof. The general redshift formula (6.17) implies that (a) is true if and 
only if the function f2{X,V) vanishes identically on ff whenever A is a 
characteristic vector field on ff. Since, at each point of jV", the kernel of 
12 (X, • ) coincides with the tangent space to A/*, this condition is satisfied if 
and only if V is tangent to Af at all points of Af. By Definition 5.3.2, this is 
equivalent to (b). D 

For an arbitrary ray-optical structure Af on AA the symmetry algebra Qj^ 
need not contain a time-like vector field normalized to gfV, V) = —1. Thus, 
only in very special cases is it possible to find a normalized observer field V 
such that the frequency is constant along all lifted rays. 

Proposition 6.2.1 can be specialized to our standard examples for which 
we have analyzed the symmetries in Sect. 5.3. This gives the following results. 
For the ray-optical structures of Example 5.1.1 the frequency with respect 
to V is constant along each lifted ray iff V is a conformal Killing vector field 
of the metric go, i.e., iff the Lie derivative Lygo is a multiple of go- (In the 
vacuum case go = g, the normalization condition giy, K) = — 1 then requires 
y to be a Killing vector field, i.e., Lyg — 0.) For the ray-optical structures of 
Example 5.1.2 the frequency with respect to V is constant along each lifted 
ray iff V is a Killing vector field of the metric go, i.e., iff Lygo — 0- For the 
ray-optical structures of Example 5.1.3 the frequency with respect to V is 
constant along each lifted ray iff the Lie bracket [V, U] is a multiple of U. 
Finally, for the ray-optical structures of Example 5.1.4 the frequency with 
respect to V is constant along each lifted ray iff [V, C/] = 0. 
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It is also interesting to consider normalized observer fields which are not in 
but can be rescaled to give an element of Qfsf- In this case the redshift does 
not vanish but it admits a representation in terms of a “redshift potential” . 
This situation, which is of particular interest in cosmology, is characterized 
by the following proposition. 

Proposition 6.2.2. Consider a ray-optical structure M and a global ob- 
server field V with g{V,V) = —1 on Ai. Then for any C°° function 
f : M. — > R the following two properties are equivalent 

(a) / is a redshift potential in the following sense. If i'. I — >• Af is a lifted 
ray of M with projection X — o the frequency u) with respect to V 
satisfies 

el = const. (6.24) 

(b) 

Proof. We write W = el V. Then the general redshift formula (6.17) implies 

(6.25) 

as can be easily checked with the help of the coordinate expression (5.24) for 
the canonical lift of a vector field. The right-hand side of (6.25) vanishes for all 
lifted rays if and only if W is tangent to M at all points of AT, i.e., if and only 
if W € • (This argument is analogous to the proof of Proposition 6.2.1). 

□ 



If the frequency has no zeros, (6.24) implies 

In (6.26) 

for any two parameter values si and S 2 » It is this expression to which the 
name “redshift potential” refers. By Proposition 6.2.2, a ray-optical structure 
admits a redshift potential, for an appropriately chosen observer field, if and 
only if there is a time-like vector field in the symmetry algebra Qj^. A ray- 
optical structure with this property is called “stationary”, a notion we are 
going to discuss in full detail in Sect. 6.5 below. For the vacuum ray-optical 
structure the notion of a “redshift potential” (or “redshift function”) was 
investigated in papers by Dautcourt [30] and by Basse and Perlick [58]. Please 
recall that for the vacuum ray-optical structure Af — Af^ the condition W G 
Qj^ means that W is a conformal Killing vector field of the metric g. 
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6.3 Isotropic ray-optical structures 



With the Lorentzian metric g given on M we can define, for each point 
q E M, the set of Lorentz transformations on the cotangent space T*M by 



Lor(^)= [A:T;M 



T*M 



(6.27) 



A is a linear automorphism with (A( • ), A( • )) = ( • , • ) | , 



The Lorentz transformations Lor(g) foliate the punctured cotangent space 

o O 

TqM. into orbits. Here, a subset Q oiT*M. is called an orbit iff it is of the 

form Q = {A(u) [ A € Lor(qf)} for some u 6 T*M. The geometry of the 
orbits is sketched in Figure 6.1. 




Fig. 6.1. The orbits of Lor(g) are the g'-light-like cone, a family of ^-space-like 
two-shell hyperboloids and a family of y-time-like one-shell hyperboloids. 



Please recall that we have defined the structure group of a ray-optical 
structure in Definition 5.3.3. The next definition characterizes the situation 
that, at a point q € Ad, the set of Lorentz transformations (6.27) is completely 
contained in the structure group of Af. 
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Definition 6.3.1. A ray-optical structure Af on (M,g) is called Lorentz in- 
variant at a point q & M. iff A{u) € Jffq for all u^Mq and A € Lor(g) . Af is 
called Lorentz invariant iff it is Lorentz invariant at all points q £ Ai. 

If a ray-optical structure Af is Lorentz invariant at a point g, Afq must be 
an orbit of Loi{q) or a union of several orbits which are separated by open 

O 

neighborhoods in T*Ad. U Af is causal, only space-like and light-like orbits 
come into question, i.e., the one-shell hyperboloids of Figure 6.1 are excluded. 
The following proposition is rather trivial. 

Proposition 6.3.1. Let Af be a ray-optical structure on (A4,g) which is 
dilation invariant and Lorentz invariant at all points q^ Ad. Then Af is the 
vacuum ray-optical structure, Af = Af^. 

Proof. By assumption, Afq is an orbit of Lor(^) or a union of several orbits, 
and Afq is dilation invariant. Clearly, the only codimension-one submanifold 

O 

Afq CT*A4 with these properties is the double cone Afq = Af^. □ 

This proposition implies that a non-dispersive medium necessarily has to 
break Lorentz invariance. 

The class of Lorentz invariant ray-optical structures is rather small. We 
get much larger classes if we require invariance not under the full Lorentz 
group but only under certain subgroups. If we fix a vector Uq e TqAi with 
gq{Uq, Uq) = -1, we cau consider the subgroup of spatial rotations 

Rot(C/,) = { A € Lor( 9 ) | A(g^{U„ ■ )) = s,(C/„ • ) } (6.28) 

with respect to Uq. This gives rise to the following definition. 

Definition 6.3.2. A ray-optical structure Af on (A4,g) is called isotropic 
at a point q £ Ai with respect to a normalized time-like vector Uq £ TqAi 
iff A{u) £ Afq for all u £ Afq and A £ Rot{Uq). Af is called isotropic with 
respect to a global observer field U with g{U, U) = —I iff Af is isotropic at 
all points q £ Ai with respect to the vector Uq = U{q) { = U evaluated at q). 

Instead of “isotropic” one might use more precise attributes such as “spa- 
tially isotropic” or “invariant under spatial rotations” . For the sake of brevity, 
however, we stick with the terminology of Definition 6.3.2. 

If a ray-optical structure is Lorentz invariant at q, then it is in particular 
isotropic at q with respect to all normalized time-like vectors Uq £ TqA4. In 
addition, we already know some examples of isotropic ray-optical structures 
that need not be Lorentz invariant. In Example 2.5.1 of Part I we have derived 
the Hamiltonian 

H{x,p) = i (6 29) 
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for light propagation in a linear and isotropic electromagnetic medium, cf. 
eq. (2.87). Clearly, this Hamiltonian generates a ray-optical structure which 
is isotropic with respect to the rest system U of the medium. We are now 
going to prove that this example is much more general than it seems to be. 
As a matter of fact, every isotropic ray-optical structure is locally generated 
by a Hamiltonian of the form (6.29) provided that it is dilation invariant. If 
it is not dilation invariant, the only modification is in the fact that the index 
of refraction must be allowed to depend on the frequency, i.e., we have to 
write n{xj —U°'{x)pa) instead of n{x). 

It needs a little bit of preparation to prove this fact. Let us assume that 
the ray-optical structure M on (M,g) is isotropic with respect to an observer 
field U. For notational convenience we introduce a natural chart (x,p) around 
a point u £ N. We have to assume that neither the frequency u{x,p), given 
by (6.8) with V® = Z7“, nor the spatial wave covector ka{x,p), given by (6.9) 
with has a zero at u. Then there is a neighborhood W C T*M. 

of u on which frequency and spatial wave covector are different from zero. If 
this neighborhood W has be en chosen appropriately , our isotropy assumption 
guarantees that the norm \/g°‘^{x) ka{x,p) kb(x,p) of the spatial wave covec- 
tor must be a function of x and u>{x,p) on A/*n W, as is nicely illustrated by 
the orbit structure of Figure 6.1. Hence there is a strictly positive real valued 
function n, defined on some subset of x E, such that 



n{x,uj{x,p))\uj{x,p)\ = g°'^{x)ka{x,p) h{x,p) (6.30) 

on A/’ n VV. This construction locally assigns a frequency-dependent index of 
refraction n to any isotropic ray-optical structure. Comparison with (6.10) 
shows that n is reciprocal to the norm of the phase velocity. Since we use units 
making the vacuum velocity of light equal to 1, this can be rephrased as saying 
that 1 /n gives the phase velocity in units of the vacuum velocity of light. It 
is to be emphasized that, as long as there are no additional assumptions 
on our isotropic ray-optical structure the index of refraction is a local 
concept. Here, “local” refers in particular to the necessity of restricting the 
fibers oiT*M.. Globally, n might be a “multi-valued function” corresponding 
to various “branches” of N. 

On an appropriate neighborhood W C T*Af , at least, the index of refrac- 
tion is well-defined and can be used to introduce a local Hamiltonian 



H{x,p) = i ( 



g^^jx) + U^{x) U^{x) 
n{x, -U^{x)pc)^ 



- U°'{x) U^{x)^ PaPb • 



(6.31) 



It is an immediate consequence of (6.30) that H vanishes on M D W. More- 
over, our assumption that the spatial wave covector has no zeros implies 
that (dH/dpa) has no zeros on A/^n W. Hence, (6.31) gives, indeed, a local 
Hamiltonian for our ray-optical structure A/*. If n is frequency-independent, 
i.e., constant with respect to its second argument, the rays determined by 
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the Hamiltonian (6.31) are the light-like geodesics of a Lorentzian metric. In 
the frequency-dependent case they are associated with a sort of “generalized 
Lorentzian metric” investigated in detail by Miron and Kawaguchi [97]. 

To sum up, we have proven that any isotropic ray-optical structure is lo- 
cally generated by a Hamiltonian of the form (6.31) near any point of A'’ where 
frequency and spatial wave covector are non-vanishing. Here, frequency and 
spatial wave covector are meant with respect to the normalized observer field 
U distinguished by the isotropy assumption. Hence, a ray-optical structure on 
(M,g) which is isotropic with respect to some given normalized observer field 
U is unambiguously characterized by an index of refraction n : A4 x R — > R"*" , 

O 

on a neighborhood W QT*M near any point where the ray-optical structure 
is well-behaved. It is easy to check that such a ray-optical structure is 

(a) causal iff 




u) dn 
n{x,Lo)‘^ du) 




2 



> 1 ; 



(6.32) 



(b) dilation-invariant iff 



dn 



(c) Lorentz invariant iff n is of the form 

n(x,uj)'^ = 1 - . 



(6.33) 



(6.34) 



Prom (b) and (c) we can read an alternative proof of Proposition 6.3.1. (c) is 
just a different way of saying that a Lorentz invariant ray-optical structure 
is locally generated by a Hamiltonian of the form 



H{x, p) = ^ Pa Pb + h{x)) . (6.35) 

Please note that the Hamiltonian (3.46) for light propagation in a non- 
magnetized plasma is of this form. In this case the function h{x) is given 
by the plasma frequency (3.51), h{x) = Wp(x)^ = ^n{x). 



6.4 Light bundles in isotropic media 

In this section we investigate the dynamics of infinitesimal bundles of light 
rays in an isotropic non-dispersive medium, thereby generalizing several stan- 
dard results of ordinary optics. For our purposes it will be necessary to as- 
sume that the medium is irrotational and that it is globally characterized 
by a single-valued index of refraction which has no zeros and no singulari- 
ties. According to the results of the preceding section, this implies that the 
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rays are exactly the light-like geodesics of a Lorentzian metric Qq, called the 
“optical metric” henceforth. The contravariant components of the optical 
metric are determined by writing (6.29) in the form H{x^p) — ^go^{x)paPb, 
i.e., go is related to the spacetime metric g by 

go(X, Y) = n" g{X, Y) + (n^ -l)g(U,X) g(U, Y) (6.36) 

for all vector fields X and Y on Ad. Here, 17 is a given observer field with 
g{U^ U) = —1 on M which is supposed to be hypersurface-orthogonal and 
n : Ad — M’*' is a given C°° function. In the following we refer to C/ as to the 
“rest system” of the medium and to n as to the “index of refraction” . These 
assumptions include, of course, vacuum light propagation as the special case 
n = 1. 

Now let us fix a ray A: I — Ad, i.e., a light-like Po-geodesic, where 
I denotes a real interval. For the sake of simplicity we choose an aflfine 
parametrization such that the tangent field K = \ : I — > TM of A sat- 
isfies the equations 



go{K,K)=0 and (Vo)xK-O (6.37) 

where Vo denotes the Levi-Civita connection of the metric go- Infinitesimally 
neighboring rays are mathematically modeled by Jacobi fields along A. (Please 
recall that Jacobi fields are defined, for arbitrary ray-optical structures, by 
Definition 5.6.2.) In the case at hand, a Jacobi field along A is a C°° map 
J : I — > TM with tm o J = A that satisfies the following two conditions. 

{Vo)k{'^o)kJ-Ro{K,J,K) is a multiple of , (6.38) 

go{K,J) — const. , (6.39) 

where Ro denotes the curvature tensor of the connection Vo • (6.38) assures 
that “the arrow-head of J is tracing a neighboring po-geodesic” and (6.39) 
assures that this neighboring geodesic is again gfo-light-like. 

For analyzing the motion of such Jacobi fields it is convenient to refer 
to an appropriate basis of vector fields along A. We introduce the following 
definition which makes sense for arbitrary curves A in our Lorentzian manifold 
{M,g). 

Definition 6.4.1. Let A: I — > M be a C°° curve and denote its tangent 
field by K. Then {Ei, , Ea- 2 ) Is called a Sachs bein along A iff for A,B — 
l,...,n-2 

(a) Ea is a C°° vector field along A, i.e., Ea- 1 — >• TM with tm°Ea = A ; 

(b) g{EA, Eb) = Sab and g{K, Ea) = 0 ; 

(c) g{EA,'^ rEb) = ^ ‘ 

A Sachs bein is called adapted to an observer field V iff the vector Ea(s) is 
g-orthogonal to the vector Vx(s) for all A = 1,. .. ,n — 2 and all s G J. 
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Whenever refering to a Sachs bein we use the summation convention for 
capital indices A , running from 1 to n - 2. 

It is easy to check that, for an arbitrary curve A and an arbitrary observer 
field V on M, there is a Sachs bein along A which is adapted to V and that 
it is unique up to transformations of the form 

Ea{s)^-*OBaEb{s) (6.40) 

where (O^a) is a constant orthogonal matrix, i.e., O^a b ^dc = Sab- In 
the literature, the name “Sachs bein” is usually restricted to the case that A is 
a light-like geodesic with respect to g. In this situation, which was considered 
in the original paper by Sachs [125], condition (b) of Definition 6.4.1 assures 
that jFC,£?i, . . . ,£?n _2 span the ^-orthocomplement of K and condition (c) 
requires that, apart from the freedom of adding multiples of Jf, each Ea is 
V-parallel along A. 

Here, however, we are considering the case that A is a light-like geodesic 
of the optical metric rather than of the spacetime metric. In the following we 
fix a Sachs bein along A that is adapted to the distinguished observer field 
17, i.e., that satisfies in addition to (6.42) and (6.43) the condition 

g{U,EA)^9o{U,EA) = 0. (6.41) 

With the help of (6.36) and (6.41), conditions (b) and (c) of Definition 6.4.1 
can be rewritten in terms of the optical metric in the following way. 

go{EA, Eb) = ^ab and go{K, Ea) - 0 , (6.42) 

(Vo)k(^-E'a) is a multiple of iiT . (6.43) 

Whereas (6.42) is obvious, it needs a bit of work to verify (6.43). One has to 
calculate the difference tensor V - Vo from (6.36) and to use the assumption 
of U being hypersurface-orthogonal ( = irrotational). 

In this situation every vector field J along A can be represented as a linear 
combination 



J(s) = J^{s) Ea{s) -f v{s) Ux{s) + w{s) K{s ) , (6.44) 

with scalar coefficients 7"^(s), v{s) and w{s). Jacobi fields are determined by 
inserting (6.44) into (6.38) and (6.39). This gives conditions on the coefficients 
and V but not on w because a Jacobi field remains a Jacobi field if an 
arbitrary multiple of the tangent field is added. If u = 0, J is ^'-orthogonal 
to the observer field U up to the irrelevant term proportional to the tangent 
field, i.e., the connecting vector from A to the neighboring ray is purely spatial 
with respect to the distinguished observer field U. (In the vacuum case n = 1 
this is true for all observer fields simultaneously.) Hence, the dynamics of 
infinitesimally thin bundles of light rays in our isotropic medium is given by 
inserting (6.44) with u = 0 into (6.38) and (6.39). As (6.39) is automatically 
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satisfied with const. = 0, we only have to care about (6.38). This gives the 
system of second order linear differential equations 

= S^b{s) J^{s) (6.45) 

for the coefficients where 

S^B = d^^goiEc,Ro{K,EB,K )) . (6.46) 

Here and in the following we write n{s) for n(A(s)) . Owing to the symmetries 
of the curvature tensor, 5^ satisfies the identity 

S^bS^^ = S^bS^^ . (6.47) 

If the Sachs bein is changed according to (6.40), S-^b undergoes the trans- 
formation 



S^b{s) I — > O^c B f(s) 5dg • (6.48) 

Now let us assume that we have a matrix valued function s i — y L(s) = 
(L^jb(s)) that satisfies the matrix analogue of the differential equation (6.45), 
i.e., 

= S^b{s) L^c{s) . (6.49) 

Then any (c^, . . . , c“~^) € determines a solution 

J^{s)=L^b{s)c^ (6.50) 

of (6.45) and, upon inserting into (6.44) with u = 0 and w arbitrary, a 
solution J of (6.38) and (6.39) with const. = 0. In other words, such a matrix- 
valued function L determines an (n — 2)-parameter family of infinitesimally 
neighboring rays around the central ray A. If det(L(s)) ^ 0, those neighboring 
rays fill the space (not the spacetime!) around A completely. For that reason, 
we call a solution L = {L^b) of (6.49) that satisfies det (L(s)) ^ 0, with 
the possible exception of some isolated parameter values s, an infinitesimal 
bundle of rays around A. More precisely, L should be called the representation 
of such a bundle with respect to the Sachs bein chosen. If we change the Sachs 
bein, we have, of course, to change L according to 

(s) ^ C^cO^B IPf{s) Sdc ■ ( 6 . 51 ) 

Since n has no zeros, (6.49) can be solved for ^i^c(s) . Thus, arbitrary 
initial values for L and for its first derivative determine a unique solution. 

Invertibility of L{s) for almost all parameter values s implies that the 
equation 



£>-^b(s)L®c{s) = £"‘c(s) 



(6.62) 
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defines a matrix D{s) — (J9^b(s)) for almost all parameter values s. (At 
those isolated values s where det(L(s)) = 0, some components of D become 
infinite.) Hence, the from (6.50) satisfy 

j^{s) = D^b{s) J®(s) (6-53) 

for almost all s. (6.53) demonstrates that the matrix D{s) measures the 
motion of the infinitesimal bundle around A with respect to the Sachs bein 
chosen. If we decompose D{s) into a symmetrical and an antisymmetrical 
part according to 



D^b(s) = 0^b{s) , 


(6.54) 


e^B(s)s^^ -e^Bis) 5^^ = 0, 


(6.55) 


+ cj^b{s) = 0 , 


(6.56) 


the symmetrical part O^q(s) gives the deformation and the antisymmetrical 
part 0)^5 (s) gives the rotation of the infinitesimal bundle with respect to the 
Sachs bein. The symmetrical part can be further decomposed according to 


0-"b(s)=<t%(s) + iS^'5b. 


(6.57) 


where 9{s) = 0^a{s) gives the expansion and (T^Bis)-, which is defined 
through (6.57), gives the shear of the infinitesimal bundle with respect to 
the Sachs bein. 

It is easy to derive propagation equations for these quantities. If we calcu- 
late the derivative of (6.52), we find that the second order differential equation 
(6.49) for L^b implies the first order differential equation 


^^5 b(s) D c{s)D b{s) 


(6.58) 


for D^b- Symmetrization respectively antisymmetrization results 


in 


^^b{s) = -;f[^S^B{s) - d'^cis) 9^b{s) - 
u}^c{s) 0^b{s) - , 


(6.59) 


i^^b{s) =uj^c{s)0^b{s)-9^c{s)u;^b{s) - 


(6.60) 



For the vacuum case n = 1, the propagation equations (6.59) and (6.60) are 
well-known and can be found in many textbooks on general relativity, see, 
e.g., Wald [146], p. 222. In particular, the trace of (6.59) gives the well-known 
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focusing equation for vacuum, light rays in the case n = 1. Please note that 
in the generalization considered here S^b{s) involves the curvature tensor of 
the optical metric according to (6.46). 

Now we use the propagation equations (6.59) and (6.60) to prove three 
theorems on light propagation in a general-relativistic medium of the kind 
under consideration and thus, in particular, in vacuum. All three have famous 
counter-parts in ordinary optics. 

Theorem 6.4.1. In an isotropic, dispersion-free and non-rotating medium 
the following holds true. If for an infinitesimal bundle of rays the rotation 
vanishes for one parameter value s, then it vanishes for all s. 

Proof. This is an immediate consequence of the fact that the rotation satisfies 
the homogeneous differential equation (6.60). □ 

This theorem can be viewed as a general relativistic analogue of the Malus 
theorem of ordinary optics. In its most elementary version, found by Malus in 
1808, this classical theorem can be formulated in the following way (cf., e.g.. 
Born and Wolf [16]). A family of straight lines that starts surface-orthogonal 
remains surface-orthogonal (a) after reflexion at an arbitrarily curved surface 
according to the usual reflexion law and (b) after refraction at an arbitrarily 
curved surface according to Snell’s law. In other words, a surface-orthogonal 
bundle of light rays remains surface-orthogonal after passing through any sys- 
tem of mirrors and lenses. Theorem 6.4.1 gives a similar statement for light 
rays in an isotropic, non-dispersive, and non-rotating medium on a general- 
relativistic spacetime. The analogy comes from the fact that a two-parameter 
family of straight lines in ordinary Euclidean 3-space is surface orthogonal 
iff, around any member of this family, the infinitesimally neighboring mem- 
bers are irrotational. In this case rotation is to be measured with respect to 
ordinary Euclidean parallel transport. 

The other two theorems refer to infinitesimal bundles of rays which are 
homocentric, i.e,, to the case that L{s) is the zero matrix for one particular 
parameter value s. 

Theorem 6.4.2. In an isotropic, non-dispersive and non-rotating medium 
the following holds true. If an infinitesimal bundle of rays is homocentric, 
then its rotation vanishes. 

Proof. Let L = {L-^b) be a solution of the differential equation (6.49) which 
can be rewritten in matrix notation as 

^( 5 ) Jr {n{s)L{s)) = S (s) L{s) (6.61) 

with S — {S^b)- Owing to (6.47), transposition of (6.61) results in 



(6.62) 




6.4 Light bundles in isotropic media 129 

where ( • denotes the transpose of a matrix. (6.61) and (6.62) together 
imply that the matrix 

C(s) = n{sf £(s) - £’"(«) £(s)) (6.63) 

has vanishing derivative, 0(s) — 0. Hence, 0(s) is a constant matrix. If L is 
a homocentric infinitesimal bundle, L{so) is the zero matrix for some specific 
parameter value Sq- So C{s) = C{so) must be the zero matrix for all s. With 
C = 0, we multiply the matrix equation (6.63) from the left by {L~^Y'{s) 
and from the right by L~^{s). This can be done for almost all parameter 
values s. By (6.52), the resulting equation reads {s) - D{s) = 0, i.e., the 
antisymmetrical part of D{s) vanishes. □ 

This theorem is analogous to the elementary fact that, in ordinary optics, 
homocentric bundles are surface-orthogonal. 

We now turn to the last of our three theorems which is of particular 
relevance for cosmology. 

Theorem 6.4.3. (Reciprocity theorem) In an isotropic, dispersion-free 
and non-rotating medium, the following holds true. If L\ and L 2 are two 
infinitesimal bundles around the same central ray X and if both L\ and L 2 
are homocentric, with Li{si) = 0 and ^ 2 ( 52 ) = then 

\det{n{s2) Li{s2))\ ^ |det(n(si)L2(si))| 

|det(n(si)Li(si)) | |det(n(s 2 )i' 2 (s 2 )) | 

Proof We use the same matrix notation as in the proof of Theorem 6.4.2. By 
assumption, the differential equation (6.61) and its transposed version (6.62) 
are satisfied hy L = Li and L = L 2 . This implies that the matrix 

Ci 2 {s) = n{sf (Lj{s) L 2 {s) - Lj {s) L 2 {s)) (6.65) 

has vanishing derivative, Ci 2 {s) = 0. Hence, Ci 2 (s) is a constant matrix. In 
particular, the equation 

C^i2(si) = Ci2{s2) (6.66) 

has to hold. Owing to our hypothesis Li{si) = 0 and 1 - 2 ( 52 ) = 0, (6.66) 
simplifies to 

n(5i)^ Li ( 51 ) X- 2 (si) = -n(s 2 )^ LJ" ( 52 ) 1 - 2 ( 52 ) • (6.67) 

(6.64) is an obvious consequence of (6.67). O 

Please note that the denominators in (6.64) are different from zero since 
Li and L 2 are infinitesimal bundles and n is strictly positive. 

To give a physical interpretation to this theorem we now assume that our 
underlying Lorentzian manifold is 4-dimensional and that, with respect to the 
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time orientation defined by the distinguished observer field C/, A(s 2 ) is in the 
future of A(si). We restrict our consideration to the section of A between these 
two points. We consider the totality of all Jacobi fields, defined via eq. (6,50) 
with L — Li, with < 1. This can be interpreted as a pencil of 

light rays issuing from the point A(si). Similarly, the analogous construction 
carried through with L = L 2 gives a pencil of light rays received at the point 
A(s2)- Now the numerators in (6.64), up to a factor tt and up to the square of 
the index of refraction, give the cross-sectional areas of these two pencils at 
the points A(si) and A(s 2 ), respectively. It is important to realize that these 
quantities have an invariant geometrical meaning, independent of the Sachs 
bein and of the ^o-affine parametrization chosen for A, On the other hand, the 
denominators in (6.64), again up to the square of the index of refraction, are 
measuring the opening angle of the respective pencil at its focal point. These 
quantities are, again, independent of the choice of the Sachs bein; however, 
they do depend on the affine parametrization chosen for A. If we switch to 
another affine parametrization by a transformation s 1 — as + h with real 
constants a ^ 0 and 6, on both sides of (6.64) the denominator is getting 
a factor As for a ray (i.e., for a ^o-light-like geodesic) the choice of a 

particular affine parametrization is a matter of arbitrariness, this argument 
shows that the denominators of (6.64) are “unphysical” in the sense that they 
cannot be measured. Therefore we introduce the quantities 

Ito(.,)(if{s2),c^(s2))|, (6.69) 



where U{s) denotes the distinguished observer field at the point A(s). rfium 
and dang are invariant with respect to changing the affine parametrization of 
A since the tangent field K in the numerator is stretched with the same factor 
as the derivative operator (overdot) in the denominator. Here it is essential 
that we restrict to the case dim(A4) = 4 such that L\ and L 2 are (2 x 2)- 
matrices. relates the cross-sectional area of the Li-pencil at A(s 2 ) to its 
opening angle at A(si) where the latter is now measured as a solid angle in 
the local rest space of the observer U(si), see Figure 6.2. In cosmology dium is 
known as the corrected luminosity distance from A(si) to A(s 2 )> whereas the 
analogously defined quantity dang is known as the angular diameter distance 
from A(si) to A(s 2 )- Now the reciprocity law (6.64) can be rewritten in the 
form 



, _ , n{sif\gx{s^) {K{si),U{si))\ 

dlnm ^^n^ri{s2)^\g^^^^^^K{s2)Ms2))\' 



(6.70) 



Please note that, by (6.23), the factor 
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Fig. 6.2. This illustration shows a pencil with central ray A, issuing from a light 
source at A(si) to an observer at A(s 2 ). The corrected luminosity distance relates 
the cross-sectional area of the pencil at A(s 2 ) to its opening angle at A(si), measured 
in the rest space of the observer U(si). The angular diameter distance is defined in 
an analogous manner for a pencil around A with focus at A(s 2 ). 



9Xisx) {K{si),U{si)) ^ (go)A(si) {K{si),U{si)) ^ 

9X{a2) U(S 2 )) i9o)x{s2) {K{s 2 ),U{S 2 )) 

gives the redshift under which C/(si) is seen by U{s 2 )- 

In the vacuum case n = 1, (6.70) gives a remarkable relation between 
corrected luminosity distance, angular diameter distance and redshift. (The 
observers U{si) and U{s 2 ) are arbitrary in the vacuum case.) As to the 
literature on this subject we refer to Etherington [41] who discovered the law 
(6.70) for the case n = 1; to Ellis [40] whose proof of the reciprocity law for 
the case n = 1 served as a model for our proof of Theorem 6.4.3; and to 
Schneider, Ehlers and Falco [128], Sect. 3.5, who give a detailed discussion 
of the reciprocity theorem for vacuum light rays and of its relevance for 
cosmology. It should also be mentioned that the name “reciprocity theorem 
goes back to Straubel [135] who introduced the analogue of Theorem 6.4.3 
in ordinary optics. This classical reciprocity theorem is closely related to the 
socalled sine condition for stigmatic imaging. The latter can be traced back 
well into the 19th century to Clausius, Helmholtz and Abbe. It is discussed, 
e.g., in the standard textbook by Born and Wolf [16], p. 166. 



6.5 Stationary ray-optical structures 

Recalling Definition 5.3.2 of the symmetry algebra stationarity of a ray- 
optical structure M can be introduced in the following way. 
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Definition 6.5.1. A ray-optical structure J\f on {M,g) is called stationary 
iff there is a vector field W G which is everywhere time-like, g{W, W) < 0. 
If, in addition, the one-form g{W, • ) is globally {or locally, respectively) of 
the form g{W, •) = hdt with some scalar functions h and t, ff is called 
globally {or locally, respectively) static. 

In the static case the spacetime is foliated into hypersurfaces t = const, 
which are p-orthogonal to the integral curves of W. It follows from the well- 
known Probenius Theorem that locally such a foliation exists if and only if 
the one-form k = g{W, • ) satisfies the equation KAdk = 0, cf. Sachs and Wu 
[126], p. 53 . 

For the vacuum ray-optical structure M = on {M,g) we know from 
Sect. 5.3 that the symmetry algebra coincides with the set of conformal 
Killing vector fields of g. Hence, the vacuum ray-optical structure is stationary 
in the sense of Definition 6.5.1 if and only if the Lorentzian manifold {Ad,g) 
is conformally stationary. 

In terms of local Hamiltonians, stationary ray-optical structures can be 
characterized in the following way. 

Proposition 6.5.1. Let M be a stationary ray-optical structure on {M,g) 
and fix a point u G M. Then there is a local Hamiltonian H for Af, defined on 
a neighborhood of u, such that dH{W) = 0. Here W denotes the canonical 
lift (5.24) of the time-like vector field W G Gj\f^ 

Proof. Since the vector field W is time-like, W {u) 0. Hence, we can choose 

a codimension-one submanifold V of T*A4 through u that is transverse 
to the flow of W. As the assumption W G implies that W {u) is tangent 
to AT, it is then automatically guaranteed that V is transverse to Af at u. 
This transversality property implies that Af (IP is a codimension-one C°° 
submanifold of P. If P is small enough, this guarantees the existence of a 
function h: P — >■ R such that the differential dh has no zeros and 
Af r\P {w €P \ h{w) = 0 }. Then the conditions H\p = h and dH{W) = 0 

define a real valued function H on a neighborhood of u. By construction, H 
is a local Hamiltonian for Af. □ 

In a natural chart, induced by a chart (x^, . . . , x“) on AA. with W = d/doA, 
the equation dH{W) = 0 means that H is independent of the coordinate x“. 

For a stationary ray-optical structure, the time-like vector field W G Gu 
defines a normalized observer field V = e~ I W, where 

/ = iln(-p(W,Ty)). (6.72) 

By Proposition 6.2.2, / is a redshift potential for this observer field. This 
observation is closely related to the fact that, by Proposition 5.3.3, the mo- 
mentum 9{W) : T*AA — > R is constant along each lifted ray. The existence of 
this constant of motion is crucial for the dimensional reduction of stationary 




6.5 Stationary ray-optical structures 133 

ray-optical structures. We are now going to discuss this reduction formalism 
in detail. 

The goal is to study, for a stationary ray-optical structure N on our 
n-dimensional spacetime {M,g), the dynamics of rays in terms of their pro- 
jections onto an (n — l)-dimensional space M. In the first step we have to 
construct the space M. The obvious idea is to introduce M as a quotient 
space of Af , calling two points of At equivalent iff they are connected hy^ an 
integral curve of the time-like vector field W G Qj^ and to hope that A^ is 
a smooth Hausdorff manifold. This is, indeed, always true locally, i.e., if we 
restrict to a sufficiently small neighborhood of an arbitrary point in M-. Glob- 
ally, however, the topological space jCi may violate the Hausdorff axiom and 
need not admit a smooth manifold structure such that the natural projection 
7 T : M — > M becomes a submersion. E.g., for a time-like vector field W on 
Minkowski space with one point removed the quotient space necessarily vio- 
lates the Hausdorff property. Also, it is easy to verify that the quotient space 
cannot be a smooth manifold if an integral curve of W is almost closed, com- 
ing back into any neighborhood of some point infinitely often without being 
periodic. 

It is, thus, necessary, to introduce additional assumptions to make sure 
that the quotient space At is a smooth Hausdorff manifold. To put this 
rigorously we introduce the following terminology, cf. Figure 6.3. 

Definition 6.5.2. Let W be a time-like C°° vector field on (Ai,g). A C°° 
function t: M is called a global timing function for W iff 

(a) dt{W) = 1 and 

(b) for any ti and t 2 in R the flow of W maps the hypersurface t - h 

diffeomorphically onto the hypersurface t = t 2 - 

It would be misleading to call t a “time function” , rather than a “timing 
function”, since the hypersurfaces t = const, need not be space-like with 
respect to the Lorentzian metric g. 

If t is a global timing function for W, the above-mentioned quotient 
space M can be identified with any of the hypersurfaces t = const.; this 
identification makes M into an (n - l)-dimensional C°° manifold such that 
the natural projection tt: Ad — ^ Ad becomes a submersion. Then the map 
(tt, t) : Af Ad X R is a global diffeomorphism. 

The above-mentioned counter-examples demonstrate that, for an arbi- 
trary time-like vector field W, a global timing function need not exist. It is 
interesting to note the following result. If W is a time-like vector field on a 
Lorentzian manifold that has no closed integral curves, then the Hausdorff 
property of the quotient space M guarantees the existence of a global timing 
function for an appropriate reparametrization of W. The proof can be taken 
over from Harris [56], Theorem 2. 

If t : Ad — > R is a global timing function for W, a second function t' : M 
R is, again, a global timing function for W if and only if t' is of the form 
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Fig. 6.3. A global timing function t for a time-like vector field W allows to write 
spacetime Ad as a product of space M and time R. 

t' =t-\r how (6.73) 

where h: M R is any C°° function. In bundle theoretical language, two 
different global timing functions for W define two different global trivializa^ 
tions for the fiber bundle w: M ^ jCi. If t can be chosen in such a way that 
the h 3 Tpersurfaces t — const, are ^-orthogonal to W, this additional condi- 
tion fixes the timing function uniquely up to an additive constant. However, 
since it is our goal to study stationary ray-optical structures, and not only 
globally static ones, we have to deal with situations where such a choice is 
not possible. 

Now let us assume that we have a global timing function t : Ad — > R for 
W. Then the global diffeomorphism (tt, Ad — > Ad x R induces a splitting 
of the cotangent spaces T*M = ® points q £ M. 

Projecting onto the first factor gives a reduction map 

ied:T*M—^T*M. (6.74) 

If we change the timing function according to (6.73), the reduction map 
undergoes the transformation 



red(u) I — ^ red'(w) = red(w) + (dh) 7 r(g) 



(6.75) 
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for u local coordinates, the reduction map is most easily expressed 

if we choose coordinates (x^ , . . . , x^) on M. with t = x“ and W = d/ dx^. Then 
we can use (x^, . . . , x“~^) £is coordinates on M and the reduction map takes 
the form 

(xS...,x“,Pi,...,Pn) I > (x^...,x“~\pi,...,Pn-l)- (6-76) 

Please note that in such a coordinate system the momentum coordinate pa 
coincides with the function 6{W) which is a constant of motion according to 
Proposition 5.3.3. 

We are now ready to formulate a reduction theorem for stationary ray- 
optical structures. Roughly speaking, this theorem says that a stationary 
ray-optical structure N on our n-dimensional spacetime {M.,g) induces a one- 
parameter family of ray-optical structures on the (n — l)-dimensional 
quotient space where the family parameter a;^ € M is given by the value of 
the conserved momentum, (jJq = -6{W). For the construction of the reduced 
ray-optical structure it is necessary to choose a global timing function t 
for the time-like vector field W € i-e-, in situations where such a t does 

not exist the reduction does not work globally. Moreover, the theorem re- 
quires two transversality assumptions. Even locally for the reduction process 
to give a reduced ray-optical structure near a point w e AT it is necessary 
that (i) the covector u is not a multiple of the differential dt at the point 
q = Tj^{u)] and (ii) the fiber derivative ¥H{u) is not a multiple of W at the 
point q = T^{u), where H is any local Hamiltonian for J\f. Note that ¥H{u) 
is a multiple of W at those points where the ray velocity with respect to the 
normalized observer field V = e~^W has a zero, see (6.7). Hence, the sec- 
ond transversality assumption just excludes all points where the ray-optical 
structure has a pathological behavior. 

The precise formulation of the reduction theorem reads as follows. 

Theorem 6.5.1. (Reduction theorem for stationary ray-optical 
structures) Let M be a stationary ray-optical structure on and let 

t: M be a global timing function for the time-like vector field W € Pjv. 
As outlined above, this induces a global diffeomorphism (tt, t) : Ad — > A4 x E 
and a reduction map red: T*Ad — > T*jO[. Now fix a value a>o G E such that 
the set Qw„ = { u € A/ | 9{W) = -Wo } non-empty. Assume that for all 
points u G Qwo (i) ihe covector u is not a multiple of the differential dt at 
q = and (ii) the fiber derivative ¥H{u) is not a multiple of W at 

q = where H is any local Hamiltonian for Af. Then = red(Qo;„) 

is a ray-optical structure on j0[. A C°° curve ^ : / — > is a lifted ray {or 

a lifted virtual ray, respectively) of if and only if it can be written in the 
form I = red o ^ where Quj„ C Af is a lifted ray {or a lifted virtual 

ray, respectively) of Af. 

Proof Recall that W is the Hamiltonian vector field of the function 9{W), 
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d{e{W))=Q{W,-). (6.77) 

This equation shows that the differential d{${W)) has no zeros. Hence, the 
set /Co;„ = {u £ T*M I (^( W)) (u) = —iOo } is a codimension-one submanifold 
of T*Ad. On the canonical two-form Q induces a two-form with a 
one-dimensional kernel. At each point of this kernel is spanned by W, as 
can be read from (6.77). Let us call two points of equivalent if they can be 
connected by an integral curve of W. Then the quotient space carries a 

Hausdorff manifold structure such that the natural projection 
becomes a submersion. This follows from the fact that W admits a global 
timing function. Moreover, the two-form induces a symplectic structure 
on It is worthwile to reconsider this construction in terms of a natural 

chart induced by coordinates (x ^, . . . , x^) on M. with t = x^ and W — djdx^. 
Then is given by the equation Pn = — i-©-, is parametrized by the 
coordinates (a;^, . . . , a;“,pi, . . . ,Pn-i)* Forming the quotient comes up 

to factoring out the coordinate x"^. This shows that can be identified, 

as a symplectic manifold, with the cotangent bundle T*M, and that the 
natural projection can be identified with the restriction of 

the reduction map (6.74) to 

Now we consider the set = Arn For all points u e ¥H{u) 
is linearly independent of by assumption. Thus, the characteristic 

direction of /Ca,„ (i.e., the direction spanned by W) and the characteristic 
direction of A/" (i.e., the lifted ray direction) do not coincide. This implies that 
A7 and have a transverse intersection at all points u £ . Thus, 

is a closed codimension-one submanifold of i.e., a manifold of dimension 
(2n — 2). Since W £ Qj\f, is invariant under the flow of W. This implies 
that the set = red(Qwo) is a closed submanifold of T*M = 
of dimension (2n — 3). Since we assume that u £ is never a multiple 
of *iccs not meet the zero section in T*M.', since we assume 

that for u £ the fiber derivative ¥H{u) is never a multiple of IFrXj(u)j 

jv is everywhere transverse to the fibers of T*A4. This proves that jv is, 
indeed, a ray-optical structure on A4. To prove the rest of the proposition, 
we observe that is foliated into the two-surfaces spanned by lifted rays 
and by integral curves of W. If we denote the pull-back of the canonical 
two-form f2 to by these two-surfaces can be characterized as the 
integral manifolds of the kernel of i?a;„ . Hence, red maps any such two-surface 
onto the image of a lifted ray of Mujo- This proves that for each lifted ray 
I : I — JVcoo there is a one-parameter family of lifted rays ^ ; I — ^ Qu„ of 
A/ such that i = red o Since the map red is fiber preserving, this result 
remains true if “lifted ray” is replaced with “lifted virtual ray” . □ 

Please note that at a point u £ Qu„ n T*A4 the frequency with respect 
to the normalized observer field V = e~^W is equal to uJq- If A7 is 
reversible one can restrict to values > 0 without loss of generality. 
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Locally around a point u e N the reduction can be carried through in 
the following way. It is convenient to introduce coordinates on 

M such that t = and W = d/dx°-. By Proposition 6.5.1, we can choose a 
local Hamiltonian H for H around u which is independent of a:“. It is easy to 
verify that a local Hamiltonian H for the reduced ray-optical structure 
is given simply by setting Pn equal to — a>o, i-e., 

. . . ,Pn-i, -a;o) • (6.78) 

It is important to realize that the reduced ray-optical structure Tv de- 
pends on the choice of the global timing function for W in the following way. 
If, in the situation of Theorem 6.5.1, the global timing function is changed 
according to (6.73), the reduced ray-optical structure changes according to 

^0,0 — > + dh{M) . (6.79) 

is again a ray-optical structure on M provided that the transversality 
conation (i) of Theorem 6.5.1 is satisfied for the new global timing function 
as well as for the old one. The proof of (6.79) follows immediately from the 
transformation behavior (6.75) of the reduction map. 

In this situation, Theorem 6.5.1 gives a natural one-to-one relation be- 
tween lifted rays of lifted rays of This relation is defined by 

associating a lifted ray | of with a lifted ray i' of iff they are 
representable in the form | = red and = red' o with the same lifted 
ray I — > of N. By (6.75), | and are then related by 

= + (6.80) 

for s £ I. There is an analoguous one-to-one relation for lifted virtual rays. 

In terms of a natural chart on T*jCi, (6.80) takes the form 

x'P{s) = ar''(s) , (6.81) 

p'p(s) = Pp(») + ^ x»-‘(s)) (6.82) 

where p = l,...,n— 1. This observation implies that the rays of J\uo 

coincide although the lifted rays do not. Similarly, the virtual rays of 
and coincide although the lifted virtual rays do not. Prom (6.80) 
or (6.82) we can also read the transformation behavior of wave surfaces. 
The function S: M — > R is a classical solution of the eikonal equation of 
if and only if the function 5 -f is a classical solution of the eikonal 
equation of . Clearly, both solutions are associated with the same family 
of rays. There is a far-reaching formal analogy between this situation and the 
dynamics of charged particles moving in a magnetostatic field. The change 
of the global timing function corresponds to a gauge transformation of the 
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magnetostatic potential. In both cases, the canonical momentum coordinates 
undergo a transformation of the form (6.82). In view of this analogy one 
might say that the rays of J\uio are “gauge invariant” whereas the wave 
surfaces are not. This “gauge freedom” can be removed if our stationary ray- 
optical structure is globally static. Then we can “fix the gauge” by choosing 
the hypersurfaces t — const, p-orthogonal to the integral curves of W. If, 
however, our stationary ray-optical structure is not globally static, then there 
is no distinguished choice for the global timing function and we have to live 
with the “gauge freedom” . 

Having clarified the dependence of the reduced ray-optical structure 
on the choice of the global timing function, we are now going to investi- 
gate its dependence on the parameter uJq which fixes the frequency of the 
rays. If the assumptions of Theorem 6.5.1 are satisfied for two real numbers 
(jJo and the reduced ray-optical structures and J\u'„ are in general 
completely different. If, however, the stationary ray-optical structure H is 
dilation-invariant in the sense of Definition 5.4.1 (i.e., if the medium under 
consideration is non-dispersive), then and jvo,; are related by the fol- 
lowing proposition. 

Proposition 6.5.2. Assume that all the assumptions of Theorem 6.5.1 are 
satisfied and that, in addition, M is dilation-invariant. Then the assumptions 
of Theorem 6.5.1 are still satisfied if u)o is replaced with aj'o = ccVo for any 
real number c> 0, and the reduced ray-optical structures are related by 

(6.83) 

In particular, the rays of ; coincide with the rays of A / . If Af is not only 
dilation-invariant but also reversible, (6.83) carries over to the case c < 0. 

Proof Recall that Af is dilation-invariant if and only, if AT = c jV for all real 
numbers c > 0. As we have seen in Sect. 5.4, dilation-invariance implies that, 
for any u E Af and any local Hamiltonian H oiAf which is defined around u, 
the fiber derivative satisfies WH{cu) = cWH{u) for all real numbers c > 0. If A/" 
is not only dilation-invariant but also reversibel, these properties remain valid 
for c < 0. On the basis of these observations the proof of Proposition 6.5.2 is 
an easy exercise. □ 

Proposition 6.5.2 is, of course, in perfect agreement with the basic idea 
that in a non-dispersive medium the spatial path of a light ray is independent 
of its frequency. 

If we have a reduced ray-optical structure A/ , constructed by the method 
of Theorem 6.5.1 from a stationary ray-optical structure, we can integrate 
each lifted virtual ray of A/ over the canonical one-form 9 of T*A4. Quite 
generally, the integral over the canonical one-form is known as the action 
functional and will play a central role in our discussion of variational prin- 
ciples in Chap. 7 below. In the case at hand, it is helpful to introduce the 
following definition. 
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Definition 6.5.3. Consider the situation of Theorem 6.5.1 with u)q ^ 0. 
For a lifted virtual ray [si, S 2 ] — >■ of the reduced ray-optical structure 

I (I) = i = it £ »£» {iis)) ds (6.84) 

is called the optical path length of Here 6 denotes the canonical one-form 
on T*M. 

If is everywhere transverse to the Euler vector field E on T*Ai, the 
integral (6.84) is a strictly monotonous function of the upper bound S 2 - In this 
case the optical path length gives us a distinguished parametrization along 
each lifted virtual ray of have already seen that such a distinguished 

parametrization exists if and only if the Euler vector field is transverse to Af, 
please recall Proposition 5.4.7 and the subsequent discussion. 

It is important to realize that the optical path length is “gauge dependent” 
in the following sense. Under a change of the global timing function the 
reduced ray-optical structure changes into according to (6.79). 
Thereby each lifted virtual ray £ [si,S 2 ] — of changes into a 
lifted virtual ray [ 51 , 52 ] — ^ ®f according to (6.80). If we 

compare the optical path length of with the optical path length of ^ we 
find that they do not coincide but are related by 

I (!') = I (I) + h (a(s 2 )) - h (Msi)) (6.85) 

where A = o o 

The following proposition relates the optical path length to the “travel 
time” , measured in terms of the global timing function t. This result is of 
.particular relevance in view of Fermat’s principle to be discussed in Chap. 7 
below. 

Proposition 6.5.3. Let, in the situation of Theorem 6.5.1, ^ : [ 51 , 52 ] — Af 
be a lifted virtual ray of Af along which the conserved momentum Q{W) takes 
the value -cJo 7^ 0- Then the optical path length 0/ ^ = red o ^ is given by 

I (I) = i (i(s)) ds + t (A(s 2 )) - t (A(si)) . (6.86) 

If Af is dilation-invariant this equation simplifies to 

1 ( 1 ) =i(A(s2))-t(A(si)) . (6.87) 

Proof Since | = red o ^ and 6{W) takes the constant value -Wo along 

~^odt\{s) 



( 6 . 88 ) 
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for s G [si, S 2 ]. To verify this equation we can use a natural chart induced by 
coordinates (a;^, , . . on A4 with = t and d/dx^ = W] then (6.88) is 
just the trivial identity pa dx^ = pp dx^ -f Pn dx^. Integrating (6.88) from s\ 
to S 2 yields (6.86). If jV is dilation-invariant, the integral vanishes owing to 
Proposition 5.4.7. □ 

If the reduced ray-optical structure Jv is strongly hyperregular and thus 
orientable, we know from Proposition 5.2.4 that there is a one-to-one relation 
between positively oriented virtual rays and positively oriented lifted virtual 
rays. In that case the optical path length can be viewed as a functional on 
(positively oriented) virtual rays rather than on lifted virtual rays. Again, this 
observation is crucial in view of Fermat’s principle. As a matter of fact, for 
any stationary ray-optical structure with relevance to physics the reduced ray- 
optical structure is indeed strongly hyperregular or at least strongly regular. 
In the latter case the above-mentioned one-to-one correspondence holds true 
at least locally. The following proposition gives a useful criterion. 

Proposition 6.5.4. Assume that all the assumptions of Theorem 6.5.1 are 
satisfied and fix a point u G fif with (0(W’))(w) = — a>o. Then the reduced 
ray-optical structure is strongly regular at the point u = red(w) if and 
only if the condition 



det 






(H-) 

0 

0 



(T^“)\ 

0 

0 / 



^ 0 



(6.89) 



holds at u in any natural chart. Here we use the same matrix notation as in 
(5.15), with H denoting any local Hamiltonian for fif and denoting the 
components of the vector field W. 



Proof. It is easy to check that (6.89) is independent of which natural chart 
and which local Hamiltonian has been chosen. We choose a natural chart 
induced by coordinates (rc^, . . . ,x“) on M with t = x^ and W = d/dx^, and 
we choose a local Hamiltonian H that is independent of x^. This is possible 
owing to Proposition 6.5.1. Then equation (6.78) gives us a local Hamiltonian 
H for ffuo around u. As in the coordinates chosen condition (6.89) 

holds at u if and only if the condition 



f {iin (HP) 

det 

V {Hn 0 



^ 0 



(6.90) 



holds at u, where p is an index numbering rows and cr is an index numbering 
columns, both running from 1 to n — 1. By Definition 5.2.2, (6.90) holds at u 
if and only if is strongly regular at that point. □ 

In many cases of interest condition (6.89) can be checked quickly with the 
help of the following result from linear algebra. 
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Proposition 6.5.5. If the matrix is invertible, H^^Hbc = (6.89) 

is equivalent to 



\HefH^Wf HghW^Wf^) 



(6.91) 



Proof We can assume that (H") and (W“) are linearly independent since 
otherwise (6.89) and (6.91) are both wrong. With this assumption, (6.89) 
is satisfied if and only if the image of the (n — 2)-dimensional vector space 
{ (Zb) € W \ H°' Za = Za = 0} under is transverse to the 2- 

dimensional space spanned by (i?“) and (W“). This is the case if and only 
if (Hab) is non-degenerate on the 2-dimensional space spanned by (il“) and 
(W“), i.e., if and only if (6.91) holds true. □ 



6.6 Stationary ray optics in vacuum and in simple media 

In the preceding section we have established the general features of the re- 
duction formalism for stationary ray-optical structures. To illustrate these 
results by way of example, we shall now carry through the reduction in full 
detail for stationary vacuum ray-optical structures. To that end we have to 
assume that we have a p-time-like vector field W e Gu where Af = de- 
notes the vacuum ray-optical structure on {Ai, g). According to our results of 
Sect. 5.3, the condition W e Gj\f means that W is a conformal Killing vector 
field of the metric g, i.e., that the Lie derivative Lwg is a multiple of g. This 
implies that W is a Killing vector field of the rescaled metric e~^^g, 

Lw{e-^fg)=0 (6.92) 

where / = ^ ln( - g{W, W)) . Hence, the one-form 

0 = -e-2/^(^, .) (6.93) 

satisfies 

(j){W) = l and Lw<l> = 0, (6.94) 

Now let us assume that we have a global timing function t : A4 — > R for W. 
This gives us a global diffeomorphism (tt, t ) : A4 — A4 x R. The fact that 
W is a Killing vector field of the rescaled metric e~^^ g induces a particular 
geometrical structure on Ad. To work this out, we use the one-form (6.93) 
and the differential dt of the global timing function to write the spacetime 
metric in the form 

g = g-^<^®(f) — {<p~dt + dt) 0 {(/> — dt -i- dt)^ (6.95) 
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which is a trivial identity. Clearly, the symmetric second rank tensor field 
satisfies (e“^-^p+000)(W, •) = 0 and fl'4-000) = 0, 

and it is positive definite on the orthocomplement of W. Hence, it must be 
the pull-back of a (positive definite) Riemannian metric ^ on A4, 

e~‘^^ g + (f) (f) = TT* g . (6.96) 

Similarly, the one-form <p~dt satisfies ((^ — dt){W) = 0 and Lw{4> — dt) = 0 * 
Hence, it must be the pull-back of a one-form ^ on M, 

(f)-dt = Tr*4> . (6.97) 

With (6.96) and (6.97) inserted into (6.95), the metric g takes the form 

g z= {^*g - {'k*4> + dt) 0 (7T*(^ -f dt'^ . (6.98) 

The conformal factor has no influence on the vacuum light rays. Thus, 
(6.98) suggests that the metric g and the one-form ^ are the relevant geo- 
metrical objects that determine the reduced ray-optical structures jV . This 
is indeed the case as we shall see below. But first we want to check if g and 
0 depend on the choice of the global timing function t. If we change t ac- 
cording to (6.73), the metric g is obviously unaffected whereas the one-form 
0 transforms like a gauge potential, 

0 I — > ^ — dh . (6.99) 

Thus, the two-form 

u) = d4> (6.100) 

is independent of which global timing function has been chosen. The geomet- 
rical meaning of u) is that it measures the rotation (=twist) of the integral 
curves of the time-like vector field W. Vanishing of a> characterizes the lo- 
cally static case, i.e., the equation d> = 0 is equivalent to W being locally 
hypersurface-orthogonal. If M is simply connected, the equation w = 0 is 
even equivalent to W being globally hypersurface-orthogonal. Let us quickly 
prove the second statement which implies, of course, the first one. On a sim- 
ply connected manifold the equation d^ = 0 guarantees the existence of a 
function h such that 0 = dh. (We have used this well-known fact already in 
the proof of Proposition 5.5.2.) Thus, a gauge transformation (6.99) with this 
function h leads to 0' = 0. Together with (6.97) this shows that, for M simply 
connected, the equation a; = 0 is equivalent to the existence of a global tim- 
ing function t' such that ^ — dt'=0 and thus, by (6.93), g{W, • ) = dt'. 
Clearly, the latter equation characterizes the case that W is orthogonal to 
the hypersurfaces t' = const., i.e., it characterizes the globally static case. 

As a preparation for the reduction, we now use the representation (6.98) 
of the spacetime metric g to write the dispersion relation for vacuum light 
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rays in terms of the spatial metric g and the spatial one-form 0. If we use 
local coordinates , x^) on M with = t and d/dx^ = W, (6.98) takes 

the form 



Qabdx^ 0 dx^ = (6.101) 

(^pa dx^ <S> dx^ — dx°^ dt) (g) {^p dx^ -I- dt '^ . 

Here and in the following greek indices are running from 1 to n— 1. gpa and 
depend on (x^, . . . , whereas / depends on (x^, . . . , x^). With the covari- 
ant metric components gab given by (6.101) it is an easy exercise to calculate 
the contravariant metric components This puts the dispersion relation 
9°'^ P a Pb = 0> by which the vacuum ray-optical structure is determined, into 
the form 



{Pp - Pn 0p) [Pa - Pn <^a) - Pn = 0 • (6.102) 

Here we have introduced the contravariant components g>^*^ of the metric g 
which are defined by g^pg^^ = 

We are now ready to construct the reduced ray-optical structures 
according to Theorem 6.5.1. Let us first check if all the assumptions of this 
theorem are satisfied in the case at hand. The set is non-empty for all real 
numbers ujo ^ 0. The transversality condition (i) of Theorem 6.5.1 is satisfied 
if and only if the one-form dt is nowhere p-light-like whereas the transversality 
condition (ii) is always satisfied. Thus, we have to assume that the global 
timing function has been chosen in such a way that the hypersurfaces t — 
const, are either everywhere space-like or everywhere time-like with respect 
to g. Then Theorem 6.5.1 gives us a reduced ray-optical structure N^o O'!! 
real numbers Wq ^ 0- Please note that the left-hand side of (6.102) gives us a 
Hamiltonian E for M that is independent of the coordinate As in (6.78) 
we get a Hamiltonian E for the reduced ray-optical structure simply by 
setting Pn equal to — Wo, 

F(a;^...,a;*'“^pl,...,pn-l) = (6.103) 

(Pm + ^o4>p) {PcT + UJo 4 ><t) - ) . 

where the factor 2oJo was introduced for later convenience. Thus, the disper- 
sion relation of the reduced ray-optical structure reads 

9^"" (Pm + {Pa - 1 - o;o 0a) - = 0 . (6.104) 

With the Hamiltonian (6.103), Hamilton’s equations take the form 

a;'" = ^ {pp + Wo 0p) , 



(6.105) 
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Pa 



2(jJo dx 



'' fj. (pp ^pj (pp. 



dx^ 



9^^^ (Pp+^o 



4>p) . 



(6.106) 



Together with (6.104), the equations (6.105) and (6.106) determine the lifted 
rays of jVo;„ in the special parametrization adapted to H. (6.105) can be 
solved for the momentum coordinates; upon inserting the result into (6.104) 
and (6.106), respectively, we find 



Qap ^ — 1 j 



\dx^ 



dxP 




Here we have introduced the Christoffel symbols 

fM _ 1 ^up(^9pk dgpx dg^x\ 
2 \ dx^ dx'^ dxP ) 



(6.107) 

(6.108) 



(6.109) 



of the metric g. (6.107) and (6.108) determine the rays olSf^Jo paramet- 
rization adapted to H. Prom (6.107) we read that this is the parametrization 
by p-arc length. In the locally static case, a) = = 0, the right-hand side of 

(6.108) vanishes and the rays are exactly the ^-geodesics. If lo does not vanish, 
the rays deviate from the ^-geodesies in response to the “force term” on the 
right-hand side of (6.108). This force term has the same formal structure 
as the Lorentz force exerted on a charged particle by a magnetostatic field. 
In this analogy, the two-form corresponds to the magnetic field strength 
and the one-form 0 corresponds to the magnetic potential. This is, of course, 
only a formal analogy. In the situation at hand u) has nothing to do with a 
real magnetic field. It is a purely kinematical quantity measuring the rotation 
( = twist) of the integral curves of W. Physically, the right-hand side of (6.108) 
can be viewed as a Coriolis force. 

Since 0 enters into (6.108) only in terms of w = the rays of 
are gauge invariant although the lifted rays are not. We know already from 
our discussion following Theorem 6.5.1 that this is a general feature of the 
reduction formalism. Moreover, as neither (6.107) nor (6.108) involves the 
parameter Wo, all the reduced ray-optical structures for Wo € R \ {0}, 
give the same rays. This observation exemplifies Proposition 6.5.2 since the 
vacuum ray-optical structure is dilation-invariant and reversible. 

We shall now derive an expression for the optical path length, which was 
introduced in Definition 6.5.3, in terms of the Riemannian metric g and the 
one-form 0. (6.104) and (6.105) determine the lifted virtual rays of in 
the parametrization adapted to H. These equations imply 

Pa^ X'^ = LJo iVPppXP - 4>pXP) . 



(6.110) 
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We can now free ourselves from the particular parametrization. Clearly, an 
orientation-preserving reparametrization leaves (6.110) unchanged whereas 
an orientation-reversing reparametrization requires replacing the positive 
square-root with the negative square-root. Since 0 — Pa dx^, (6.110) gives us 
the integrand of the optical path length (6.84). If we switch back to invariant 
notation, the optical path length of a lifted virtual ray ^ ; [si , S 2 ] — ^ 
takes the form 

l(l)=j(”(±\/§(O)-0(i))(s)ds (6.111) 

where A: [si,S 2 ] — ^ M is the projection of ^ to As A is light-like, we 
can read from (6.98) that 

±^/§(XJ)-K'^)=dt(X). ( 6 . 112 ) 

Comparison with (6.93) and (6.97) shows that the positive square-root in 
(6.112) corresponds to the case that the parametrization of A is future- 
oriented with respect to W, i.e., g{W,X) < 0. Inserting (6.112) into (6.111) 
demonstrates that, in the case at hand, the optical path length of ^ is equal 
to the travel time with respect to the global timing function used for the re- 
duction. The same result follows from Proposition 6.5.3, using the fact that 
the vacuum ray-optical structure is dilation-invariant. 

(6.111) clearly shows that, up to an orientation-depending sign ambiguity, 
the optical path length of i is determined by its projection A. We have already 
mentioned that this is true whenever the reduced ray-optical structure J\(jo 
is strongly hyperregular. In the case at hand J\u„ has the additional property 
that every C°° curve in A4 is a virtual ray. For this reason the optical path 
length can be viewed a.s a functional on the set of all C°° curves A in A4, 
given by the right-hand side of (6.111). 

(6.111) again exemplifies the gauge dependence of the optical path length. 
In the globally static case we can choose the global timing function in such 
a way that 0 vanishes. In this distinguished gauge the optical path length 
coincides with the ^-arc length. In the stationary but non-static case, however, 
the gauge freedom in the definition of the optical path length cannot be 
removed. 

We have now established all the relevant equations of the reduction for- 
malism for stationary vacuum ray-optical structures. Examples will be given 
in Chap. 8 below, where the metric g and the gauge-dependent one-form 0 
are calculated for several (conformally) stationary spacetimes (M,g) with 
relevance to physics. For examples of this kind we also refer to Abramowicz, 
Carter and Lasota [3] and to Perlick [109]. 

For light propagation in matter, the reduction formalism has, in general, 
rather different features in comparison to the vacuum case. In particular, 
the reduced ray-optical structure is, in general, not determined by a 
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Riemannian metric and by a one-form which is unique up to gauge transfor- 
mations. However, there is a special class of (non-dispersive) media to which 
our vacuum results immediately carry over, viz., media characterized by a 
Lorentzian “optical metric” . So let us consider a ray-optical structure H on 
(M,g) which is of the kind given in Example 5.1.1, i.e., let us assume that 
N" CT*M is the null cone bundle of a Lorentzian metric Qq. If go is confor- 
mally equivalent to g, then N" is the vacuum ray-optical structure on (M, g), 
otherwise A/* gives light propagation in a medium. We assume that Af is sta- 
tionary, i.e., we assume that there is a vector field W which is time-like with 
respect to g and a conformal Killing field with respect to go- liW is time-like 
with respect to go as well, and if we can find a global timing function for 
W, then we can carry through the reduction procedure in analogy to the 
vacuum case. Now it is, of course, the optical metric go that is decomposed 
in the form (6.98). Hence, vanishing of the induced one-form ^ implies that 
W is orthogonal to the hypersurfaces t — const, with respect to the opti- 
cal metric go and does not characterize the globally static case. Similarly, 
it is now the metric go with respect to which the hypersurfaces t = const, 
have to be non-light-like in order to guarantee that the assumptions of Theo- 
rem 6.5.1 are satisfied for all ujo ^ 0. In this situation the rays of the reduced 
ray-optical structure are, again, determined by equations of the form (6.107) 
and (6.108), and the optical path length is, again, representable in the form 
(6.111). Explicit examples of this kind will be given in Chap. 8 below. 

One of the most interesting aspects of the reduction formalism for station- 
ary ray-optical structures is that it provides a link between our general rela- 
tivistic Lorentzian geometry setting of ray optics and the ordinary Euclidean 
geometry setting of elementary textbook ray optics. Roughly speaking, ray 
optics in media, as it is treated in elementary optics textbooks, can be viewed 
as the result of our reduction process applied to an appropriate ray-optical 
structure on Minkowski space. If (M,g) is n-dimensional Minkowski space, 
we can use pseudo-Cartesian coordinates (x^, . . . , = t) to identify M 

with and to put g into the form 

g = 5po dx^ <8) dx°^ — dt^dt . (6.113) 

Here, as before, the summation convention is used for greek indices running 
from 1 to n — 1. This induces a natural chart (x^, . . . , . . . ,Pn) globally 
on T*M. Up to a minus sign, the momentum coordinate = 9{W) gives 
the frequency with respect to the inertial system V = W = d/dtiox any ray- 
optical structure Af on M. Now let us consider a ray-optical structure TV on 
M such that all the matter functions that enter into the dispersion relation 
are independent of the time coordinate x^ —t. This implies that W = d/dt e 
i.e., it implies that Af is stationary. Since g{W, • ) = —dt, Af is then even 
globally static and the time coordinate x^ — t gives a distinguished global 
timing function. For all frequency values Wo € M for which the assumptions 
of Theorem 6.5.1 axe satisfied, the reduction formalism gives us a reduced 
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ray-optical structure on Euclidean space M = The physically 

interesting case is, of course, n = 4. Any sort of medium treated in elementary 
optics textbooks can be modeled in terms of such a one-parameter family of 
ray-optical structures on A4 = 

Here is a special example of this construction. Let us assume that the 
ray-optical structure J\f on Minkowski space is given by the equation 

= 0 (6.114) 

where n; x R — s- is a C°° function. In the terminology of Sect. 6.3, 

A/* is isotropic with respect to the inertial system V = W = d/dt. The 
function n gives the index of refraction which is assumed to be independent 
of the time coordinate t = whereas it may depend on the frequency -pn- 
In this situation the assumptions of Theorem 6.5.1 are satisfied for all uJq 0. 
Prom (6.114) we read that the reduced ray-optical structure jV is governed 
by the global Hamiltonian 

H(a;\...,a;““\pi,...,pn-i) = 

1 ( ( 6 - 115 ) 

This implies that the rays of are the geodesics of the conformally flat 
Riemannian metric 

9p<t = n{- ,uJo)^ Spa ■ (6.116) 

on M which depends, of course, on the frequency value uJo • For a lifted 
virtual ray i of A/’w„, the optical path length J(^) equals the p-arclength 
of its projection A to A4, where g denotes the Riemannian metric given by 
(6.116). We have thus rediscovered the standard textbook formulae for ray 
optics in dispersive isotropic media. 




7. Variational principles for rays 



In this chapter we want to characterize the rays of a ray optical structure 
in terms of variational principles. In particular, we want to investigate for 
what kind of ray optical structures some version of Fermat’s principle holds 
true. This question is of interest not only from an abstract theoretical point 
of view but also in view of applications. 

Most elementary optics textbooks, such as, e.g.. Born and Wolf [16], give a 
formulation of Fermat’s principle for non-dispersive and isotropic media only. 
However, generalizations to more complicated media are known, see, e.g., 
Newcomb [100] . If we allow for dispersive and anisotropic media, Fermat’s 
principle in ordinary optics can be phrased in the following way. 

Fix two points in space and a frequency value u)o- Consider all pos- 
sibilities to go, along different spatial paths, from one point to the 
other at the velocity of light as it is determined, for the frequency a;©, 
by the medium considered. Among all these “trial curves” , the actual 
light rays are then the local extrema and saddle-points of a certain 
functional which is called the optical path length. If the medium is 
non-dispersive it is not necessary to specify the frequency and the 
optical path length can be reinterpreted as travel time. 

Whenever a variational principle can be viewed as a mathematical re- 
formulation of this statement it is legitimate to consider it as a version of 
Fermat’s principle. In the following we establish several variational princi- 
ples for the rays of a ray-optical structure, and we discuss their relation to 
Fermat’s principle. 



7.1 The principle of stationary action: The general case 

In classical mechanics it is usual to define the action functional on C°° curves 
C' [^ 15 ^ 2 ] T*M by integration over the canonical one-form 6, i.e., 

.4® ^ If = 1^" («*)) 
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Actually, the action functional can be defined on curves of a more general 
differentiability class; for the time being, however, we stick to the C°° case. 

Prom standard textbooks on classical mechanics we know that the solu- 
tions of Hamilton’s equations satisfy a principle of stationary action. There- 
fore, it should not come as a surprise that the lifted rays of an arbitrary ray- 
optical structure on M satisfy a principle of stationary action as well. In com- 
parison to the situation of classical mechanics there are three modifications. 
First, for an arbitrary ray-optical structure the existence of a Hamiltonian 
is guaranteed only locally. Second, lifted rays have to satisfy the dispersion 
relation, i.e., in mechanical terminology they are restricted to the “energy 
surface” H = 0. Third, lifted rays can be reparametrized arbitrarily which is 
reflected by the arbitrary stretching factor k in the ray equations (5.9). Mim- 
icking standard techniques of classical mechanics, but taking care of these 
three modifications, we find the following principle of stationary action for 
lifted rays of an arbitrary ray-optical structure, cf. Figure 7.1. 

Theorem 7.1.1. (Principle of stationary action for lifted rays) Let 

M be an arbitrary ray-optical structure on Ai and fix a C°° immersion 
[si,S 2 ] — ^ Af . As allowed variations of ^ consider all C°° maps 
77 ; ] - £o,£o[ X [si,S 2 ] — > Af with 77 ( 0 , •)= ^ for which the tangent vectors 
to the curves rj( • , si) and r]{ • , 52 ) o.re annihilated by the canonical one-form 
0. Then the following holds true. 

(a) If ^ is a lifted ray of N, then ^ is a stationary point of A, 

A-4(>?fe-))U=0 (7.2) 

for all allowed variations rj of 

(b) Conversely, if (7.2) is true at least for all allowed variations rj of ^ with 
fixed endpoints in fif, i.e., with Tj{-,si) and r}{' , 82 ) constant, then ^ is 
a lifted ray. 

Proof. Let 77 be an allowed variation of ^ and denote the pertaining varia- 
tional vector field by Y: [si, S 2 ] — ^ TAf , i.e., 

y(s)=77(-,s)-(0). (7.3) 



Then the variational derivative of the action can be calculated in the following 
way, using standard derivative rules. 

£A{n(e, = J" ((d9)ew(y(s),4(3)) + i(««w(r(s))))ds = 

- f %,)(y(s),^(s))ds + %32)(^(s2)) - %si)(y(si)) = 

J 8l 

r n(^,^{i(s),Y{s))ds. 

J S1 



(7.4) 
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Pig. 7.1. An allowed variation, in the sense of Theorem 7.1.1, must be completely 
contained in M. 



In the last step we have used the fact that l^(si) and ^( 52 ) are annihilated 
by 6 because rj is an allowed variation. Since our allowed variations stay 
within M, Y must be everywhere tangent to AT. Thus, we can read from 
(7.1) that part (a) of the proposition follows directly from Definition 5.1.2. 
To prove part (b) we assume that the last integral in (7.1) vanishes for all 
C°° vector fields Y along ^ that are everywhere tangent to A/" and vanish at 
the endpoints. Hence the fundamental lemma of variational calculus implies 
that ^ must be a lifted ray according to Definition 5.1.2. 

Alternative proof. For those readers who feel more comfortable with tradi- 
tional index notation we give an alternative proof. We assume that ^ can be 
covered by the domain of a Hamiltonian and of a natural chart, and we write 
5 for the derivative with respect to £ at £ = 0, as in (5.46), (5.47) and (5.48). 
Then (7.1) takes the form 



5A = 5 f pa{s)x^{s)ds = 

J Sx 

[ 5pa{s)x°'{s)ds + I Pa{s)6x°'{s)ds = 

J Si d Si 

rs2 

/ 5pa{s)x^{s)ds -f- 
J Sx 

rS2 

Pa{s 2 ) Sx°'{s 2 ) - Pa{si) <5a:“(si) - / Pa{s) 5x^{s) ds = 

f Spa (s) x^ (s) ds - f Pa (s) {s) ds . 

J S\ J Si 



(7,5) 
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Since all allowed variations are confined to J\f, they have to preserve the 
constraint equation H(x{s),p{s)) = 0, i.e., 

^ (rr(s),p(s)) Sx^{s) + ^ = 0 . (7.6) 

To prove part (a) we recall that lifted rays satisfy (5.11) and (5.12). Upon 
inserting these equations into (7.1) the desired result follows immediately 
with (7.6). To prove part (b) we insert (7.6) into (7.1) with the help of a 
Lagrange multiplier k(s). This results in 



Is ^(^(^)’^^(^))) ~ 

f <^a:“(s) {pa{s) + k{s) §^{x{s),p{s))) ds . 

J Si 



(7.7) 



By assumption, the right hand side of (7.7) is equal to zero for all smooth 
5pa and Sx^ that vanish at si and S 2 - (The Lagrange multiplier k{s) allows 
to forget about (7.6).) Hence, the fundamental lemma of variational calculus 
implies that the ray equations (5.11) and (5.12) have to be satisfied. □ 



Theorem 7.1.1 implies, in particular, that ^ is a lifted ray if and only if 
• )) |^_Q = 0 for all allowed variations for which the curves ry( • , si) 
and Tj{ • , S 2 ) are vertical, i.e., for variations that keep the endpoints fixed in 
M. 

As A{x) is obviously invariant under orientation-preserving reparametriza- 
tions of we could equally well allow for variations that change the parameter 
interval [si,S 2 ]. However, such a generalization of Theorem 7.1.1 will not be 
needed in the following. 

We emphasize that it is not justified to use the name “principle of minimal 
action”, rather than “principle of stationary action”, for Theorem 7.1.1. In 
general, a lifted ray can be a local minimum, a local maximum, or a saddle- 
point of the action functional. This is true even if we restrict to arbitrarily 
short rays. 

The advantage of Theorem 7.1.1 is in the fact that it holds for arbitrary 
ray-optical structures. In particular, no regularity assumption is needed. Its 
disadvantage is in the fact that a very big set of allowed variations is used. 
Apart from the boundary conditions, all curves in Af, i.e., all curves for 
which the momenta satisfy the dispersion relation, are to be considered as 
“trial curves” 7](s, ■ ). This includes curves with arbitrary velocities and not 
only motions at the velocity of light. For this reason Theorem 7.1.1 cannot 
be viewed as a version of Fermat’s principle as it was stated in the beginning 
of this chapter. If we restrict the set of trial curves to motions at the velocity 
of light, in general we will have not enough allowed variations to prove an 
analogue of part (b) of Theorem 7.1.1. (There is, of course, no problem with 
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part (a).) In Sect. 7.2 we shall see that there are still enough allowed variations 
if jV” is strongly (hyper-)regular. 

Theorem 7.1.1 has several interesting consequences. To give just one ex- 
ample, we now show that part (a) of Theorem 7.1.1 leads to the familiar 
Huyghens construction of (lifted) wave surfaces. Here we have to recall the 
notions of generalized solutions of the eikonal equation and of lifted wave 
surfaces from Sect. 5.5. 

Theorem 7.1.2. (Huyghens construction) LetM he a ray-optical struc- 
ture on M. which is everywhere transverse to the flow of the Euler vector field. 
Let ^ be the flow of the distinguished characteristic vector field X on M which 
is determined by the equation 6j^(X) — 1. Fix a generalized solution C of the 
eikonal equation of ff and a lifted wave surface S of C. Let s € R 6e such 
that the set ^s{^) Is non-empty and, thus, again a lifted wave surface of C. 
Then ^s(«S) can be constructed as the envelope of the surfaces ^sihfq) where 
q ranges over all points in fA that can be represented in the form q = (u) 

with some u £ S. 

Proof Choose any u G S and let ^ ^ M denote the maximal integral 
curve of the distinguished characteristic vector field X on M with ^(0) = u. 
We have to prove that at the point ^(s) the (n - l)-dimensional surfaces 
^s{S) and ^a{ffq) are tangent to each other where q = tJ^(w). To that end 
we consider all C°° maps rji ] — £q , [ x [0> ^ ^ with 77 ( 0 , • ) = ^ such that 

the varied curves r]{s, • ) are lifted rays with 7]{e, 0) G Mq and r}{s, s) € ^s{^) 
for all £ G ] - ^o[ • By Theorem 7.1.1 (a), any such 77 satisfies the condition 

•))|g^o‘ implies that for any such 77 the vector t 7 (*,s)‘( 0 ) is 
tangent to Since all elements of T^(^s){^s{^)) can be represented in 

this form, ^a(<5) and ^s(A/”g) must be tangent to each other at ^(s). □ 



M 



M 




a)Example 5.1.2 b)Example 5.1.5 

Fig. 7.2. Theorem 7.1.2 leads to the familiar Huyghens construction in M accord- 
ing to which wave surfaces are the envelopes of “elementary waves” . Please note 
that transversality of the ray-optical structure to the flow of the Euler vector field 
is necessary. This is the case, e.g., for Example 5.1.2, where the “elementary waves” 
are hyperboloids, and for Example 5.1.5, where the “elementary waves” are spheres. 



Upon projection to M, Theorem 7.1.2 gives the familiar Huyghens con- 
struction according to which the wave surfaces (projections of ^a{^)) are 
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the envelopes of “elementary waves” (projections of ^s(*A/’q)), see Figure 7.2. 
Clearly, the projections to M of ^s(«S) and ^sU^q) are smooth submanifolds 
only if ^s(<S)) and are transverse to the fibers. 

It is to be emphasized that Theorem 7.1.2 has to presuppose a ray-optical 
structure Af that is transverse to the flow of the Euler vector field since 
otherwise a characteristic vector field X with Oj^{X) — 1 does not exist. In 
particular, Theorem 7.1.2 does not apply to ray-optical structures that give 
light propagation in a non-dispersive medium on a spacetime. For this reason 
Theorem 7.1.2 has more relevance for ray-optical structures on space rather 
than on spacetime. 



7.2 The principle of stationary action: 

The strongly regular case 

We have already stressed that Theorem 7.1.1 cannot be viewed as a ver- 
sion of Fermat’s principle since the trial curves are not restricted to motions 
at the velocity of light. What we want to have is a theorem, analogous to 
Theorem 7.1.1, where only lifted virtual rays are considered as trial curves 
rather than arbitrary curves in AT. (Please recall that lifted virtual rays are 
defined through Definition 5.2.4.) We are now going to show that such a 
theorem holds true for strongly regular ray-optical structures. Contrary to 
Theorem 7.1.1 we restrict to variations with fixed end-points in M. 

Theorem 7.2.1. LetM be a strongly regular ray-optical structure on M and 
fix a lifted virtual ray ^ : [si,S 2 ] — ^ -A/" of fif. As allowed variations of ^ 
we consider all C°° maps r]: ] — [ x [si» ^ 2 ] — >• M with 7/(0, • ) = A for 

which the curves rj{s, • ) are lifted virtual rays for all e € ] — eo>£o[ o>nd the 
curves t/( • , si) and r}{ • , S 2 ) are vertical. Then ^ is a lifted ray if and only if 
•^■A(r]{e, ■ )) = 0 for all allowed variations rj of 

Proof. Since allowed variations in the sense of this theorem are, in partic- 
ular, allowed variations in the sense of Theorem 7.1.1, the “only if’ part is 
a trivial consequence of part (a) of Theorem 7.1.1. To prove the “if’ part, 
let Z: [si,S 2 ] — ^ TAf be any C°° vector field along ^ with Z{si) = 0 and 
Z{s 2 ) = 0. Then we can find a C°° map p: ] — eo»£o[ x [si, S 2 ] — ^ Af with 
A^(0> •) = ^> m(s»si) = ^(si) and //(e, S 2 ) = ^(^ 2 ) for all £ € ] - £o?£o[ such 
that Z is the pertaining variational vector field, i.e., Z{s) = ja(-,s)‘(0). In 
general the curves fjL{e, ■ ) will not be lifted virtual rays, so /i will not be an 
allowed variation of ^ in the sense of this theorem. Therefore we consider the 
projection k — op: ] — £0 > ^0 [ x [si , S 2 ] — ^ Moi pto M which gives a vari- 

ation of A = o ^ with fixed end-points. Now we have to recall that strong 
regularity guarantees local solvability of (5.10) and (5.11) for the momenta 
and for the factor k. By compactness of the interval [si,S 2 ] this guarantees 
existence and uniqueness of an allowed variation r/: ] — £ 0 , £o[ x [si, S 2 ] — > Af 
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of for £o sufficiently small, that projects onto n. We denote the variational 
vector field of r} by Y as in (7,3). Now we have two variations fj. and r) of ^ 
both of which project onto k. Hence, the difference between the pertaining 
variational vector fields Z and Y must be vertical. Since rj is, in particular, 
an allowed variation in the sense of Theorem 7,1.1, (7.1) holds true. By hy- 
pothesis, 0 = ■^A{rj{£, • )) Ig^Q. Thus, the last integral in (7.1) has to vanish. 
Since Y — Z : [si,S 2 ] — ^ TM is vertical and ^ is a lifted virtual ray, this 
integral still vanishes if Y is replaced with Z. But Z was an arbitrary C°° 
vector field along ^ tangent to A7 that vanishes at both endpoints. Hence, as 
in the proof of Theorem 7.1.1, the fundamental lemma of variational calculus 
implies that ^ must be a lifted ray. □ 

If A7 is not only strongly regular but even strongly hyperregular, we can 
choose an orientation for A7 and we can construct a one-to-one relation be- 
tween positively oriented lifted virtual rays and positively oriented virtual 
rays according to Proposition 5.2.4. In this situation the action functional, 
which is defined by (7.1) on curves in T*A4, gives a well-defined functional 
A on the set of all positively oriented virtual rays A via 

A(A) = ^(0. (7.8) 

Here ^ is the unique positively oriented lifted virtual ray that projects onto 
A and ^(^) is defined by (7.1). Therefore, in the strongly hyperregular case 
Theorem 7.2.1 can be reformulated as a variational principle for rays, rather 
than for lifted rays, in the following way. 

Theorem 7.2.2. (Principle of stationary action for rays) Let M he 
a ray-optical structure on M which is strongly hyperregular and thus ori- 
entable. Choose an orientation for M and fix a positively oriented virtual 
ray A: [si,S 2 ] — > M. Consider as allowed variations of A all C°° maps 
k: ] — £o,£^o[ X [si>S 2 ] — ^ M with k(0, •) = -^ «(£^,si) = A(si), 

k{£, S 2 ) = A(s 2 ); o,nd the curves K(e, • ) are virtual rays for alls e]- £q,£oI 
Then X is a ray if and only if ^A(ac(£, • ))|e=o = 0 for all allowed variations 
K of X. Here A is defined through (7.8). 

The proof follows immediately from Theorem 7.2.1 and Proposition 5.2.4. 

If M is to be interpreted as space (and not as spacetime). Theorem 7.2.1 
and Theorem 7.2,2 can be viewed as versions of Fermat’s principle. To put this 
rigorously, we consider a stationary ray-optical structure and we assume that, 
for some value Wo € M, all the assumptions of Theorem 6.5.1 are satisfied such 
that the reduction can be carried through. We can then apply Theorem 7.2.1 
to the reduced ray-optical structure A/'w„, provided that is strongly reg- 
ular. (Criteria for strongly regular are given in Propositions 6.5.4 

and 6.5.5.) The action functional A of Theorem 7.2.1 equals the optical path 
length X of Definition 6.5.3 up to the constant frequency factor If we 
exclude the pathological case Wo = 0, varying A is equivalent to varying X. 
Thus, Theorem 7.2.1 tells us that, among all lifted virtual rays between two 
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fixed points in space A4, the lifted rays are characterized by making the op- 
tical path length J stationary. This can be viewed as a version of Fermat’s 
principle. If jV is even strongly hyperregular, Theorem 7.2.2 gives a more 
familiar reformulation of this result in terms of rays rather than in terms of 
lifted rays. In the non-dispersive case the optical path length can be reinter- 
preted as a travel time, according to Proposition 6.5.3, and the rays of 
are the same for all positive values of cuo, according to Proposition 6.5.2. 

Viewed in this sense. Theorem 7.2.2 covers virtually all known versions 
of Fermat’s principle for stationary situations. Considering stationary vac- 
uum ray-optical structures, as in the first part of Sect. 6.6, reproduces Fer- 
mat’s principle for vacuum light propagation on (conformally) stationary 
Lorentzian manifolds as it is given in many textbooks on general relativ- 
ity, see, e.g., Landau and Lifshitz [76] or, for the static case, Frankel [43] or 
Straumann [136]. Considering stationary ray-optical structures on Minkowski 
space, as in the last part of Sect. 6.6, reproduces all elementary textbook ver- 
sions of Fermat’s principle in ordinary optics. 

Theorem 7.2.1 and 7.2.2 also apply to some (necessarily dispersive) ray- 
optical structures on spacetimes, e.g., to those of Example 5.1.2 giving light 
propagation in a non-magnetized plasma on a general-relativistic spacetime. 
In those cases, however, they cannot be interpreted as versions of Fermat’s 
principle since the trial curves have fixed endpoints not only in space but 
even in spacetime. 



7.3 Fermat’s principle 

Now we want to ask if light rays in an arbitrary general-relativistic medium 
can be characterized by a version of Fermat’s principle. Throughout this 
section we presuppose a Lorentzian manifold (M,g) with dim(A4) > 2, as in 
Chap. 6. Our physical interpretation refers to the case dim(A4) = 4 where 
(Ai,g) can be viewed as a general-relativistic spacetime. For an arbitrary 
ray-optical structure N on M, the results of Sect.s 7.1 and 7.2 can then be 
summarized in the following way. 

In any case, the lifted rays of M are characterized by the variational 
principle of Theorem 7.1.1. This, however, cannot be viewed as a version of 
Fermat’s principle because the space of trial curves is too big, as outlined 
above. If M is strongly regular, which by Corollary 5.4.1 can hold only if M 
describes light propagation in a dispersive medium, the lifted rays of M are 
characterized by the variational principle of Theorem 7.2.1. This, however, 
cannot be interpreted as a version of Fermat’s principle either because the 
end-points of the trial curves are fixed in spacetime rather than in space. 
The results of the preceding sections give a version of Fermat’s principle 
only in the very special case that M is stationary. More precisely, we have 
to assume in addition that all the conditions of the reduction theorem (i.e., 
of Theorem 6.5.1) are satisfied for some ujg E'R \ {0} and that the reduced 
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ray-optical structure is strongly regular. Then Theorem 7.2.1 applied to 
gives us a version of Fermat’s principle for the lifted rays with frequency 
constant Uq in the medium considered. If jv is even strongly hyperregular, 
this result can be reformulated as a variational principle for rays, rather than 
for lifted rays, according to Theorem 7.2.2. 

If jV is a non-stationary ray-optical structure on (M,g), our previous 
results do not give us a version of Fermat’s principle for its rays or lifted rays. 
Therefore we have to formulate another variational principle which needs 
some preparation. The first step is to define the space of trial curves. 

According to the elementary formulation given at the beginning of Chap. 7 
Fermat’s principle requires to consider motions “at the velocity of light” . In 
our setting this translates into considering (lifted) virtual rays of J\f. For 
convenience we shall restrict to (lifted) virtual rays which are defined on the 
fixed parameter interval [0, 1]. 

Then we have to impose boundary conditions by fixing “two points in 
space”. The appropriate way to translate this into a spacetime setting is to 
fix a point q and a time-like curve 7 in spacetime M, and to restrict to virtual 
rays A that start at q and terminate on 7. q can be interpreted as “a point in 
space at a particular time” ; 7 can be interpreted as “a point in space viewed 
over some time interval”. Allowing the endpoint to float along a time-like 
curve is necessary since Fermat’s principle requires to vary the arrival time. 
Further physical motivation for considering light rays between a point and a 
time-like curve will be given in Sect. 8.4 below when we are going to discuss 
gravitational lensing. 

Finally, as we want to include dispersive media, it is necessary to “fix the 
frequency”. This requires to choose an observer field. More precisely, what 
we shall need is not exactly a time-like vector field on all of Ai but rather 
a time-like vector field along each virtual ray from q to 7. We introduce the 
following definition, see Figure 7.3. 

Definition 7.3.1. Let 'j be a time-like curve in the Lorentzian manifold 
{M.,g). A generalized observer field for 7 is a map W that assigns to each 
virtual ray X that terminates on 'y a time-like C°° vector field W\ along 
X that coincides with 7 at the . end-point. Le., if X : [0,1] — > A4 termi- 
nates at A(l) = 7(T(A)), with T{X) denoting some parameter value, then 
W\ : [0? 1] — ^ TAi satisfies the conditions Wa(s) € Tx(s)Ai for all s € [0, 1] 
and Wa(1) =7(T(A)). 

This definition generalizes the notion of observer fields in the following 
sense. If W is an ordinary observer field, i.e., a time-like C°° vector field on 
Ai, and if 7 is an integral curve of W, then the assignment A 1 — ^ W o A 
gives a generalized observer field for 7. Whereas observer fields do not exist 
on Lorentzian manifolds which are not time-orientable, generalized observer 
fields always exist for any time-like curve 7. E.g., we may define Wx{s) by 
parallely transporting the vector 7(T(A)) along A 
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Fig. 7.3. A generalized observer field for 7 , as defined in Definition 7.3.1, assigns 
to each virtual ray X that terminates on 7 a time-like vector field W\ along A. 



Upon choosing a generalized observer field for 7 we may assign a frequency 
uj{s) to each point ^(s) of a lifted ray by the equation 

‘^W = -(fW)(WAW), (7.9) 

provided that A = o ^ terminates on 7. As, in general, the frequency does 
not satisfy a conservation law along rays, the right way to “fix the frequency” 
is to prescribe a frequency value Uo for the arrival at 7 and to require that the 
redshift law for lifted rays is everywhere satisfied. (Please recall our discussion 
of redshift in Sect. 6.2.) Collecting all this together, we are led to defining 
the space of trial curves q, 7, W, Wo) in the following way. 

Definition 7.3.2. Let H he an arbitrary ray-optical structure on the Lo- 
rentzian manifold {M,g). Fix 

(1) a point q e M ; 

( 2 ) a C°° embedding 7: I — M. from a real interval I into M. such that 
7 = ^ 7 ( • ) f 

( 3 ) a generalized observer field W for 7 ; 

( 4 ) a non-zero real number uJq • 

Then we define the space of trial curves q, 7, W, Wo) as the set of all 

immersions [0, 1] — > T*Ai such that 

(a) ^ is a lifted virtual ray of N ; 

(b) 

(c) there is a T{^) e I such that ^( 1 ) € ’ 

(d) (?(l))(7{r(?)))=-a,„; 
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(e) ^ satisfies the redshift law of lifted rays with respect to the generalized 
observer field W, i.e., (^(s), Q(s)) = 0 for all C°° vector fields 

Q: [0, 1] — >• Tfif along ^ with TrJ^ o Q parallel to along o 

please recall (6.18). 

In terms of a natural chart and a local Hamiltonian, condition (a) requires 
that the representation (x{s),p{s)) of ^ has to satisfy (5.10) and (5.11). By 
condition (e), the equation 

= -k(s) W“(s) ^ (x(s),p(s)) (7.10) 

has to hold with the same factor k{s) that appears in (5.11); here VF“(s) 
denotes the components of Wrj;^o^(s) • 

In the non-dispersive case, i.e., if ff is dilation-invariant. Proposition 6.5.2 
gives a natural one-to-one relation between the spaces 3Jl(A/’, 7 , W, cVo) and 

q, 7 , W, cuJo) for any constant c > 0. If A/* is not only dilation-invariant 
but also reversible, this result carries over to the case c < 0 . 

In the stationary case, the distinguished time-like vector field W e Gat 
gives us a generalized observer field W : A 1 — > VF o A for each integral curve 
7 of W. (We hope that the reader will not be confused by our using the same 
symbol W for two mathematical objects which are different but related to 
each other in an obvious way.) In this special case conditions (d)_and (e) 
of Definition 7.3.2 are equivalent to saying that the momentum 6{W) takes 
the constant value —Wo along If, in addition, all the assumptions of the 
reduction theorem (i.e., of Theorem 6.5.1) are satisfied, conditions (a), (d) 
and (e) of Definition 7.3.2 imply that | = red o ^ is a lifted virtual ray of the 
reduced ray-optical structure non-stationary case, however, there 

is no reduced ray-optical structure and the trial curves have to be defined in 
the rather complicated way of Definition 7.3.2. 

With the space of trial curves at hand, we are now ready to introduce the 
functional that is to be extremized. 

Definition 7.3.3. Under the assumptions of Definition 7.3.2, 

is called the generalized optical path length functional. Here A denotes the 
action functional ( 7 . 1 ) and T denotes the arrival time functional defined by 
condition (c) 0 / Definition 7.3.2. 

In the non-dispersive case, i.e., if M is dilation-invariant, the general- 
ized optical path length reduces to the arrival time, .F(^) = T(^), owing to 
Proposition 5.4.7. 

In the stationary case, with W given by the distinguished time-like vector 
field W e Gm and 7 an integral curve of W, the reduction theorem (i.e.. 
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Theorem 6.5.1) can be used to rewrite the generalized optical path length in 
terms of the reduced ray-optical structure, provided that all the assumtions of 
the reduction theorem are satisfied. In that case we find that the generalized 
optical path length is related to the optical path length of Definition 6.5.3 
by = Z(red ° -f const, for all ^ q, 7, W, Uq), owing to Propo- 

sition 6.5.3. This justifies the name “generalized optical path length” for the 
functional Z”. 

Our goal is, of course, to prove that the stationary points of the functional 
T are exactly the lifted rays in 9Jt(A/’, 5, 7, W, Uo)- Our previous results suggest 
that some kind of regularity condition will be necessary to prove this. The 
crucial point is the following lemma. 

Lemma 7.3.1. Assume that all the assumptions of Definition 7.3.2 are sat- 
isfied and fix a lifted virtual ray ^ € SDT(A/’, g, 7, Win;©). Assume that along ^ 
the regularity condition 



det 



(H^) (TV“)\ 
(H^) 0 0 

V (W^) 0 0 / 



^0 



(7.12) 



holds in any natural chart and with any local Hamiltonian^ where JT® = 
dH/dpa, H^^ = dH/dpaPb, o,nd TV® denotes the components of the vector 
field TVrj^o^ • Consider as allowed variations of ^ the set of all C°° maps 
rj: ] - £0, X [0) 1] — ^ -AT with r]{0, • ) = C o^nd r){e, • ) € 97l(A/’, g, 7, W, Wq) 
for all e €.] — £oi^o[- Then the following holds true. If Z: [0,1] — >• TM is 
any C°° vector field along ^ with Z{0) = 0 and Z{1) — 0, then there is an 
allowed variation fj of ^ such that o (Y — Z) is a multiple of W along 

o Here Y denotes the variational vector field of rj which is defined by 
(7.3). 

Proof Let Z: [0, 1] — > Tff be a <7°° vector field along ^ with Z(0) =0 and 
Z(l) = 0. Fix a variation of ^ in with variational vector field equal to Z 
that keeps the end-points fixed, i.e., fix a C°° map p: ] — eo>£^o[x [0, 1] — > fif 
with Ai(0, •) = ^» m(-iS)‘(0) = Z{s), /u(£,0) = ^(0) and = ^(1). In 

general, p will not be an allowed variation of We shall now use this map 
fjL for constructing another variation rj that satisfies all the requirements 
of the proposition. 

In the first step we give the construction of rj under the special assumption 
that ^ can be covered by the domain of a natural chart and of a local Hamilto- 
nian. Moreover, we assume that the natural chart is induced by coordinates 
. . . ,x®) on M with TV® = 5® along the central curve o In that 
case, our assumption (7.12) allows us to solve the system of equations (5.10), 
(5.11) and (7.10) for pi, . . . ,Pn-i, k,x^,pa along this central curve. By conti- 
nuity, the same solvability condition is true for curves which are sufficiently 
close, i.e., for varied curves with sufficiently small variational parameter e. 
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(Here we make use of the fact that our curves are defined on the compact 
interval [0,1].) In other words, with known, (5.10), (5.11) and 

(7.10) give us algebraic equations for pi, . . . ,Pa-i and first order differential 
equations for and Pn which have to be satisfied by the varied curves to 
be constructed. For £ 7^ 0, the coordinate representation of the curve p(e, • ) 
will not satisfy the system of equations (5.10), (5.11) and (7.10). However, 
we can take the coordinates x^(£, s), . . . , rr““^(£, s) from this representation 
and determine the quantities pi, . . . ,Pn-ij k, x^^Pn, locally uniquely, in such 
a way that (5.10), (5.11) and (7.10) hold. Together with the boundary val- 
ues a;“(£,0) = a:“(0,0) and Pn(£, 1) = -oJo this determines a unique curve 
p(£, • ) : [0, 1] — >• N" near ^ for all £ sufficiently small. In this way we get an 
allowed variation p of Its variational vector field Y satisfies the condition of 
Tt^o(Z-Y) being parallel to since, by construction, the coordinates 

. . . , x^~^ coincide along p(£, • ) and /i(£, • ). 

If ^ cannot be covered by a single chart of the kind considered above, this 
construction must be supplemented by an appropriate matching procedure. 
By compactness of the interval [0, 1], we can find finitely many intermediary 
points. So = 0 < Si < • • • < Sm-i < s^ = 1, such that, for each 0 < i < 
m — 1, ^ restricted to [sj, Sj+i] can be covered by a chart as considered above. 
On each interval [si,Si+i] we can then construct p(£, •): [si,Si+i] — »■ N 
as above, with any choice of initial conditions x^{si) and Pn(si)- There is a 
unique choice for these initial conditions such that, by joining these segments 
together, we get a continuous map • ) : [0, 1] — >■ M that satisfies the 
boundary conditions (b) and (d) of Definition 7.3.2. To verify that this map 
is, indeed, of class C°° it suffices to check that the first order coordinate 
differential equations by which r/(£, • ) is piecewise defined have a tensorial 
transformation behavior. This implies that r}{e, • ) at all of its points satisfies 
an invariant first order differential equation with C°° coefficients, so it must 
be a C°° map. Thus, rj satisfies all the requirements of the lemma. □ 

In the stationary case W € Quj condition (7.12) is equivalent to strong 
regularity of the reduced ray-optical structure, recall Proposition 6.5.4. In the 
stationary or non-stationary case. Proposition 6.5.5 gives a useful criterion for 
(7.12) to hold. This criterion implies, in particular, that for the vacuum ray- 
optical structure, U = (7.12) is automatically satisfied for any time-like 

W. 

We are now ready to prove Fermat’s principle. 

Theorem 7.3.1. (Fermat’s Principle) Let all the assumptions of Defi- 
nition 7.3.2 be satisfied and fix a curve ^ Consider 

as allowed variations of ^ all C°° maps rj: ] — £ q ,£ o [ x [0, 1] — >■ ff with 
7^(0, •) = C and r/(£, •) G Tl{fif,q,'y,W,u}o) for all e e]- £o,£o[- Then the 
following holds true. 
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(a) For ^ to be a lifted ray it is necessary that •))le=o ~ ® 

allowed variations rj of Here J- denotes the generalized optical path 
length functional defined in Definition 7.3,3. 

(b) If the regularity property (7.12) is satisfied along this condition is not 
only necessary but also sufficient 

Proof. Let rj be an allowed variation of ^ and denote the pertaining variational 
vector field by Y, as in (7.3). Then a calculation analogous to (7.1) yields 



■)) 



e=0 



■ )) ■ )) 



e=0 



(7.13) 



i: + i%i)(K(l)) + iT{riis, .)) 

%/ 0 



e—0 



Here we have used the equation Y (0) = 0 which follows from boundary con- 
dition (b) of Definition 7.3.2. Now we consider boundary condition (c) of Def- 
inition 7.3.2 which implies that 1)) = Differentiation 

with respect to £ at £ = 0 yields Tr]ii{Y{l)) = iT'(r/(£, •))|e=o- 

Now we apply the covector ^(1) to this vector equation, and we use boundary 
condition (d) of Definition 7.3.2. This shows that the last two terms in (7.3) 
cancel, i.e.. 



■ )) ^ f ««») («^). ns)) (7.14) 

Jo 

for all allowed variations. If ^ is a lifted ray, the integrand vanishes since the 
curves 7/(£, • ) are confined to ff. This proves part (a). To prove part (b) we 
choose an arbitrary C°° vector field Z : [0, 1] — > Tff along ^ with z(o) = 0 
and Z{1) = 0. By Lemma 7.3.1 we can find an allowed variation rj with 
variational vector field Y such that o {Z — Y) is a multiple of 
along o^. By hypothesis, the right-hand side of (7.14) has to vanish. Since 
^ satisfies the redshift condition (e) of Definition 7.3.2 this remains true if 
Y is replaced with Z. As Z: [0, 1] — > Tff was an arbitrary C°° vector field 
along ^ that vanishes at the endpoints, the fundamental lemma of variational 
calculus implies that ^ must be a lifted ray. □ 

This theorem may be interpreted as Fermat’s principle for light rays in ar- 
bitrary media on arbitrary general-relativistic spacetimes. In other words, as 
far as the medium and the underlying spacetime is concerned. Theorem 7.3.1 
is the most general version of Fermat’s principle in general relativity. One 
may think of further generalizations by considering spatially extended light 
sources and receivers (i.e., replacing the point q and the time-like curve 7 by 
higher-dimensional submanifolds) or by relaxing the C°° assumption on the 
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Fig. 7.4. Fermat’s principle for vacuum light rays on a Lorentzian spacetime can 
be phrased in the following way, see Theorem 7.3.2. Among all light-like curves 
from g to 7 the vacuum light rays are the extremals of the arrival time functional 
T. 



trial curves. We shall not be concerned with the first generalization here, but 
we shall be forced to deal with the second one in Sect. 7.5 on Morse theory 
below. 

It is an important feature of Theorem 7.3.1, somewhat unfamiliar from 
elementary optics, that the trial curves and the solution curves live in the 
cotangent bundle rather than in the base manifold. If we want to reformulate 
this theorem as a variational principle for rays, rather than for lifted rays, we 
have to assume that the map ^ i — >• o C is injective on q, 7 , W, Wo). 
This property is related to condition (7.12) like hyperregularity is related 
to regularity. We can then reformulate Theorem 7.3.1 in terms of rays, just 
as Theorem 7.2.1 could be reformulated as Theorem 7.2.2. If AT is dilation- 
invariant in addition, we can free ourselves from the necessity to choose a 
generalized observer field W and a frequency constant lOq- This results in a 
considerably simplified version of Fermat’s principle. Such a simplification 
is possible, in particular, for the vacuum ray-optical structure A/* = 
Roughly speaking, the result is that among all light-like curves between a 
point and a time-like curve in a Lorentzian manifold the light-like geodesics 
are exactly the curves of stationary arrival time. The precise formulation 
reads as follows, cf. Figure 7.4. 

Theorem 7.3.2. (Fermat’s principle for vacuum light rays) Let (M,g) 
be a Lorentzian manifold. Fix a point q ^ AA. and a C°° embedding 7 : I — > 
AA from a real interval I into AA such that the tangent field ofj is g -time-like 
everywhere. Let £,{q, 7 ) denote the set of all virtual rays of the vacuum ray- 
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optical structure N = {i.e., all g-light-like C°° curves) X: [0,1] — > M 

with 

(a) X{0) = q; 

(b) there is a T(A) € I such that A(l) = 7 (r(A)) . 

Fix a virtual ray A € £(^, 7 ) and consider, as allowed variations of X, all 
C°° maps k: ] — £o> £^o[ x [0, 1] — >■ M with k(0, •) = X and • ) € £(g, 7 ) 
for all s e]—£o,£o[. Then X is a ray of M — {i.e., a geodesic) if and only 

if -^T{k{£, • )) |^_Q = 0 for all allowed variations k of X. Here T denotes the 
arrival time functional defined on £( 5 , 7 ) by condition (b). 

Proof. After choosing a generalized observer field W for 7 and an arbitrary 
real number u>o ^ 0 , we consider the space of trial curves Tl{N"^,q,'y,W,u)o) 
as it was introduced in Definition 7.3.2. To our virtual ray A € £(^, 7 ) we 
assign a map ^ 1 — > q, 7 , W, cjo) via 

= (7.15) 

where the function k : [ 0 , 1 ] — M \ { 0 } is defined as the solution of the linear 
differential equation 

SXis){Wx{s), X(s)) k{s) = 9x(.){W^a(s). Vi(,)A) fc(s) (7.16) 

with 

*=(!) = :iffAa)(W^A(l),A(l)). (7.17) 

Please note that the metric function on the left-hand side of (7.16) has no 
zeros; so k is, indeed, well-defined. (7.15) expresses the fact that ^ is a lifted 
virtual ray of that projects onto A; (7.16) guarantees that ^ satisfies 
the redshift condition (e) of Definition 7.3.2 whereas (7.17) takes care of 
part (d) of Definition 7.3.2. Hence, ^ is indeed in VJl{M^ ,q,j,W,u)o). The 
same construction gives a bijective relation between allowed variations k of 
A and allowed variations, in the sense of Theorem 7.3.1, rj of As is 
dilation-invariant, the generalized optical path length reduces to the arrival 
time, F{t){£, •)) = T{r]{e, •)) = ^(«(Sj •))• Moreover, it is easy to check 
that with the vacuum Hamiltonian H{x,p) = ^g°'^{x)paPb the regularity 
condition (7.12) is satisfied for any time-like W at all points oiAf^. Hence, 
Theorem 7.3.2 is an immediate consequence of Theorem 7.3.1. □ 

The idea to formulate Fermat’s principle for vacuum light rays in the 
version of Theorem 7.3.2 is essentially due to Kovner [74] who emphasized the 
relevance of this result in view of gravitational lensing. We shall comment on 
applications of Fermat’s principle to gravitational lensing in Sect. 8.4 below. 
The first proof of Theorem 7.3.2 was given in Perlick [108]. Later, Perlick and 
Piccione [112] have proven a more general version of Theorem 7.3.2 where the 
point q and the time-like curve 7 are replaced with higher-dimensional sub- 
manifolds. This generalization may be viewed as a vacuum Fermat principle 
for light sources and receivers which have a spatial extension. We shall not 
be concerned with this generalization here. 
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7.4 A Hilbert manifold setting for variational problems 

Variational problems can be formulated in two quite different ways. First, 
there is the traditional formulation we have used so far, where variations 
are considered in terms of a parameter e and stationarity of a functional is 
characterized by vanishing first derivative with respect to e for all “allowed 
variations” . The advantage of this approach is that it uses nothing but finite 
dimensional calculus. 

The second method of formulating a variational problem is much more 
sophisticated. It consists in representing the functional to be varied as a differ- 
entiable mapping T\ Tl — > M defined on some infinite dimensional manifold 
of maps. Typically, is modeled on a (real) Hilbert or Banach space, e.g. on 
a Sobolev space. A manifold modeled on a mere Frechet space might also do 
in some case or other. However, this is too weak a structure for most appli- 
cations. The elements of fPt are the “trial maps”, i.e., the candidates among 
which the solutions of the variational problem are to be determined. In this 
formulation stationarity of the action functional is expressed by vanishing of 
the Frechet differential d.F, i.e., the solutions to the variational problem are 
the critical points of T. 

Once a variational problem has been cast into a Hilbert manifold set- 
ting, several interesting results from global analysis become applicable. This 
includes, in particular, the body of theorems known as infinite dimensional 
Morse theory which, was developed in papers by Palais [104], by Smale [131], 
and by Palais and Smale [105]. Infinite dimensional Morse theory proved 
particularly powerful when applied to the geodesic problem on Riemannian 
manifolds, see, e.g.. Palais [104], Schwartz [130] and Klingenberg [72] [73]. 
Among other things, this approach allows to decide whether a geodesic gives 
a local minimum, a local maximum, or a saddle-point of the action functional 
by counting the conjugate points along the geodesic. Moreover, on complete 
Riemannian manifolds it relates the number of geodesics joining two given 
points to the topology of the underlying manifold. At least partly, similar 
results were known already before infinite dimensional Morse theory came 
into existence. Back in those times it was necessary to use finite dimensional 
approximation techniques, introduced by Marston Morse in the 1930s, which 
are detailed in a well-known book by Milnor [96]. 

Rather than in geodesics on a Riemannian manifold we are interested in 
rays of a ray-optical structure. To make infinite dimensional Morse theory 
applicable to this situation we have to cast one of the variational problems 
treated in the preceding sections into a Hilbert manifold setting. It would 
be most desirable to work this out for the general Fermat principle given 
in Theorem 7.3.1. Unfortunately, this would be extremely difficult. For this 
reason we will be satisfied by establishing a Morse theory for the much sim- 
pler variational principle given in Theorem 7.2.2, i.e., for the principle of 
stationary action for rays of a hyperregular ray-optical structure on M. As 
outlined above, this can be viewed as a version of Fermat’s principle if M is 
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to be interpreted as space. Setting up a Morse theory for the general Fermat 
principle of Theorem 7.3.1 will be a challenge for future work. It should be 
mentioned, however, that for the vacuum version of this variational principle, 
given in Theorem 7.3.2, a Morse theory has been established by Perlick [110] 
and, to a fuller extent, by Giannoni, Masiello and Piccione [47] [48]. 

In Theorem 7.2.2, the trial curves are virtual rays in the sense of Defi- 
nition 5.2.4 and thus C°° curves. The basic idea to get a Hilbert manifold 
of trial curves is to replace the C°° condition with a Sobolev if’’ condition. 
Therefore it will be necessary to recall definition and some basic properties 
of spaces. 

Let C'’’([0, 1],R^) denote the set of all r times continuously differentiable 
maps from the interval [0, 1] to for any integers r > 0 and N >1. Define 

< / 1 ft >r = V / /<‘>(s) ■ ft<‘>(«) ds , (7.18) 

i=0 •'» 

ll/llr = s/<f\f>r (7.19) 

for all /, h € C’’ ([0, 1] , , where /W : [o, 1] — >• E^ denotes the i-th derivar- 

tive of / and the dot stands for the standard Euclidean scalar product on E^. 
The scalar product (7.18) makes C"'([0, 1],E^) into a real pre-Hilbert space 
the completion of which is by definition the Sobolev space ff’'([0, 1],E^). 
Instead of (7.18) some other topologically equivalent scalar product may 
be used, as is done, e.g., by Palais [104] and by Schwartz [130]. Note that 
H^{[0, 1],E^) coincides with the familiar Lehesgue space L^([0, 1],E^). It is 
easy to check that C°°([0, 1],E-^) is dense in H’’*([0, 1],E^) for all r > 0. 

Integration of the identity 



f^‘^Hs2)=f^'^\si)+f f^^'^^\s)ds (7.20) 

J 51 

with respect to si from 0 to 1 and application of Schwartz’s inequality quickly 
shows, after renaming the arbitrary element S 2 G [0, 1] into s, that 

l/®(s)l <2||/||r, 0<i<r-l (7.21) 

for all / G C’'([0, 1],E-^) and r > 1. Hence, if functions fm G G’'([0, 1],E-^) 
form a Cauchy sequence with respect to || • ||r, then the fm converge pointwise 
towards some /oo G C’'~^([0, 1],E‘*^). For this reason ff’’([0, 1],E-^) can be, 
and will be henceforth, identified with a subset of C’'~^([0, 1],E^) for all 
integers r > 1. The elements of H®([0, 1], E-^) = L^([0, 1], E^) can, of course, 
only be identified with equivalence classes of functions [0, 1] — > E^. 

The notion of an curve in our manifold M is introduced in the fol- 
lowing way. By Whitney’s embedding theorem (see, e.g., Golubitsky and 
Guillemin [49], Proposition 5.9) we can find a C°° embedding j : M — > E^, 
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for some positive integer N. With such an embedding j we define for each 
integer r >1 

iT ([0, 1), A4) = {A : [0, 1] — » M I i o A 6 iT {[0, 1], R") } (7.22) 

where the ring denotes composition of maps as usual. It is easy to see that 
the set 1], A4) does not depend on the embedding j chosen. Moreover, 

it is a fundamental result, first stated by Palais and Smale [105], that the 
map A I — »• j o X makes the set into a C°° submanifold of 

1],]R:^) and that the Hilbert manifold structure thereby induced on 
H^ ([0, 1], M) is equally independent of j. For r = 1, the proof can be found 
in Palais [104] or in Schwartz [130]. Thereupon, the proof for r > 1 can be 
given by induction. Henceforth we use this result and consider H’’'([0, l],M) 
as a Hilbert C°° manifold in its own right, for all integers r > 1. 

Now we repeat the same construction with M replaced by TAf. This 
gives, for each integer r > 1, a Hilbert manifold JEf’'([0, 1],TA4) that may be 
viewed as the tangent bundle of iy^([0, 1], A4). More precisely, the tangent 
space of H"^([0, 1], M) at a point A € if^([0, 1], M) is given, in the sense of 
a natural identification, by 

TxH^{[0A],M) = {ZeH^{[0,llTM)\TM°Z = X} (7.23) 

for r > 1. This becomes obvious if the tangent vectors to F’’([0, 1], Af) are 
expressed with the help of an embedding j : At — > and its tangent map 

Tj : TM — > TR^ ^ 

We now state two simple lemmas which are readily verified with the help 
of an embedding j : At — ^ R^ ■ 

Lemma 7.4.1. For each integer r>2, a C°° curve A: [0,1] — >■ At is in 
i?’’’([0, 1], A4) if and only if its tangent field is in H’'~^([0,l],TAt). The map 

([0,1], M) H^-'^ ([0,1], TM) , X^X (7.24) 



is a C°° map. 



Lemma 7.4.2. For each integer r > 1 and each s € [0, 1] the evaluation 
map 

evs- ([0,1], At) At , A ^ A(s) (7.25) 

is a C°° map. Its tangent map at a point X e H'^([0,1],M) is given by 

Txevs : rAF^([0,l], At) — ^ Tx(^s)M , Y ^Y(s) . (7.26) 

Furthermore, we shall need the following important result. 
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Lemma 7.4.3. Consider a C°° map (f): M.\ — ^ M .2 between two finite 
dimensional C°° manifolds Mi and M 2 and fix an integer r > 1. Then 
^(A) - (f)oX defines a C°° map 0: i7^([0,l], A^i) — »• iJ’'([0,l], A^ 2 ) and 
the tangent map is given by ({T^)x{Z)){s) = {T^)x(s){Z{s)) for all 
X € H^{[0,l],Mi); Z € TAi7^([0,l],A^i), and s € [0,1]. 

For r = 1, the proof can be found in Palais [104] or in Schwartz [130]. For 
r > 1, the result was first stated in Palais and Smale [105] and can be proven 
by induction over r. 

After these preparations we now turn to the problem we are interested in. 



7.5 A Morse theory for strongly hyperregular 
ray-optical structures 

As indicated above, it is our goal to rephrase Theorem 7.2.2 as a variational 
principle on a Hilbert manifold. To that end we modify our notion of virtual 
rays in two respects. First, we replace the C°° condition on virtual rays by 
an H'^ condition, for an appropriate integer r. Second, we choose a global 
Hamiltonian and use it for fixing the parametrization of each virtual ray. 
(Since Theorem 7.2.2 presupposes a strongly hyperregular ray-optical struc- 
ture, it only applies to situations where the existence of a global Hamiltonian 
is assured.) We are, thus, led to introduce the following definition. 

Definition 7.5.1. If M is a ray-optical structure on M and H a global 
Hamiltonian for N, we denote by 2J(2?) the set of all maps A: [0, 1] — M 
that satisfy the following condition. There is a ^ G -1T^([0, 1], A/^) and a cGIR'^ 
such that Tj^ o ^ = A and ° C)’ = cFi^o 

Lemma 7.4.1 and Lemma 7.4.3 imply that 2J(i7) C i7^([0, 1], A4). It is 
obvious that each C°° curve A € ^{H) is a virtual ray in the sense of Def- 
inition 5.1.2. Conversely, any such virtual ray which is defined on a com- 
pact parameter interval can be made into an element of ^{H) by a unique 
reparametrization. It is thus justified to say that Definition 7.5.1 translates 
our earlier notion of virtual rays into an il^-setting. In natural coordinates, 
elements of ^{H) are represented by exactly those x € Jir^([0, 1],E“) for 
which the system of equations 

^ ^ (^(«)’^^(«)) » ( 7 - 27 ) 

H{x{s),p{s)) =0 . (7.28) 

admits a solution c € R'^, p E JT^([0, 1],R“). In the strongly hyperregular 
case such a solution must be unique. 

We write ^{H), rather than 5J(A7), to indicate that this set depends on 
the choice of a global Hamiltonian. However, if H and H are two global 
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Hamiltonians for one and the same ray-optical structure, there is a natural 
one-to-one relation between QJ(ff) and ^{H) given by relating those curves 
to each other which coincide up to reparametrization. 

We are now going to show that, in the strongly hyperregular case, QJ(Jff) 
carries a natural Hilbert manifold structure. 

Proposition 7.5.1. LetM be a ray-optical structure on M. Assume that H 
is a global Hamiltonian for Af such that the map an ' Af x — > TAA de- 

fined by (5.16) is a C°° diffeomorphism onto its image. {Please recall that, by 
Definition 5.2.2, such a global Hamiltonian exists if and only if Af is strongly 
hyperregular.) Then^{H) is a C°° submanifold of H^{[0,1],A4). 

Proof We denote the image of an by Since cth is a C°° diffeomorphism 
onto its image, its differential has maximal rank. Hence, C"'" is open rnTAA. 
As a consequence Lemma 7.4.1 implies that the set 

h2([0,1],M;C+) = { a 6 ^"“([0,1), JK) 1 A S H'([0,l],c+) } (7.29) 

is a C°° submanifold of H‘^{[0,1],M). Now we introduce the map 

X:H^([0,11,M;C+) -^H‘([0,ll,R+) 

A I — » pr2 o ° A 

where pra : A/" x R+ — ^ R+ denotes the projection onto the second factor. 
It follows immediately from Lemma 7.4.1 and Lemma 7.4.3 that x is a C°° 
map. This map is defined in such a way that ^{H) = x“H^'*') where the set 
of constant functions from [0, 1] to R"*" is identified with R"^. By Lemma 7.4.2, 
R+ is a C°° submanifold of H^{[0, 1],R'''). Now the statement of the propo- 
sition follows if we are able to prove that x is a submersion. To that end we 
pick an element A € ^{H) C H^ {[0,1], A4\C'^). Then the equation x(-^) — ^ 
has to hold with some c € R"*". By continuous extension of the tangent map 

Tax: TxH\[0,l],A4^,e+) Teil'{[0, 1],R+) ^ Hi([0, 1],R) (7.31) 

we get a map 

: TxH^ ([0,1], A4-,C+) — ^ H°([0,1],R) , (7.32) 

where TxH"^ {[0,1], AA-,C-^) denotes the closure of TAi3’^([0, 1], A4;C+) in 
([0, 1],M). Since pra is homogeneous, any f e H^ ([0, 1], R) has to 
satisfy the equation 

2Ax(/A)=c/. (7.33) 

This proves that the image of Tax is dense in H^ ([0, 1], R) . On the other hand 
the image of Tax is a closed linear subspace of H^ ([0, 1] , R) . Both observations 
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together imply that the image of T\x is all of 1],E), i.e., that x is a 

submersion at the point A. As this result holds for all A G we have 

proven that is a closed submanifold of 1], AdjC**") 

and, thus, a submanifold of 1], M). □ 

If the assumption of strong hyperregularity is satisfied, we can use this 
result and view as a Hilbert C°° manifold in its own right. To impose 
boundary conditions we need the following proposition. 

Proposition 7.5.2. Under the assumptions o/ Proposition 7.5.1, the map 

77: A (A(0), A(1)) (7.34) 

is a C°° submersion. Thus, 

QJ(77; ^) = { A G QJ(77) | A(0) = q } (7.35) 

is a closed C°° submanifold o/2J(77), and 

m(H-, q, q') = {A € 3J(H) | A(0) = q, A(l) = q' } (7.36) 

is either empty or a closed C°° submanifold of^{H\q) for any q and q' in 

M. 

Proof. By Lemma 7.4.2, 77 is a C°° map. To prove that 77 is a submersion, 
we pick any element A G QJ(77). Then the equation x(A) = c has to hold with 
some c G where the map x is defined by (7.30). Again by Lemma 7.4.2, 
the tangent map of 77 at the point A is given by 

TaJT: c TxH\[0, \],M) Tx^o)M x Tx(i)M , 

Y ^ {Y(0),Y{1)) . 

We consider the continuous extension 



TaJT: TxQ3{H) — . Ta(o,AJ x Tx^M 

of Ta 77, where T\’03{H) denotes the closure of T\^{H) in 1], A4). 

Let Y be an arbitrary element of rA77^([0, 1], A4). So, in particular, y(0) 
and y(l) are arbitrary vectors in T\(fi)M and Ta(i)A^, respectively. We want 
to find a function / G 77^ ([0, 1],R) such that 

/A G Ta2?(77) . (7.37) 

Since we know from the proof of Proposition 7.5.1 that 2J(77) = x~"^(R’^)j 
the function / satisfies (7.37) if and only if 

Tf3c{Y + /A) = Tf3c{Y) +cf = const. 



(7.38) 
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For any choice of const. (7.38) has a unique solution / € JT^([0,1],R) with 
/(O) = 0. By integrating (7.38) from 0 to 1 we see that there is a unique 
choice for const, such that the corresponding solution / satisfies the boundary 
condition /(I) = 0. With this function / we get 

Tji(Y + /A) = (K(0), y(l)) (7.39) 

which proves that TxII is surjective, i.e., that the image of T^il is dense in 
^ On the other hand, the image of T\II is a closed subspace 

of Tx{o)M X Tx{i)M. Thus, JT is a submersion. □ 

To define the action functional on QJ(jEf), we have to recall that, by Propo- 
sition 5.2.4, for a strongly hyperregular ray-optical structure there is a one- 
to-one relation between virtual rays and lifted virtual rays. Translated into 
our i?’'-setting, this gives rise to the following proposition. 

Proposition 7.5.3. Under the assumptions of Proposition 7.5.1, the map 

{[0,1], T*M) , Ai — ^priOo-^'^oA (7.40) 

is injective and of class C°°. Here prj ; A/’x R+ — > ff denotes the projection 
onto the first factor. 

Proof Lemma 7.4.1 and Lemma 7.4.3 guarantee that H is a C°° map. By 
Proposition 5.2.4, the restriction of H to 2J(F) n C°°{[0, 1], M) is injective. 
By continuous extension, S must be injective. D 

It is not difficult to show that, moreover, S is an immersion. In natural 
coordinates 3 is represented by the map x i — >• {x,p) given by solving (7.27) 
and (7.28) for c € M+ and p € H^{[0, 1],R^). Hence, for every Z 6 Tx^{H) 
with coordinate representation 5x € ^^([0, 1],R“), the coordinate represen- 
tation {5x,5p) e H^{[0, 1],R2“) of the vector T3{Z) satisfies the system of 
equations 

3(x* -c^(a:, ;>))(«) = 0 , (7.41) 

(S(j?(x,p))(s) = 0, (7.42) 

with some 5c € R. As our notation suggests, one should think of 5 as of the 
derivative with respect to a variational parameter s at the point £ = 0, and 
one should calculate the left-hand sides of (7.41) and (7.42) with the help 
of the ordinary chain rule and product rule. This procedure is justified by 
Lemma 7.4.3. Moreover, 5 and ( • )’ commute since the derivative map from 
if^+i([0,l],R^) to H^([0,l],R^) is linear. 

We are now ready to define the action functional A on 23(H), i.e., to 
translate (7.8) into our H^-setting. 
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Proposition 7.5.4. Let all the assumptions o/Proposition 7.5.1 be satisfied. 
Then the action functional A: ^(H) — >• 1: defined by 

^(A) = J 6 = ((3(A)) (A)) (s) ds , (7.43) 

is a C°° map. Here S denotes the map introduced in Proposition 7.5.3. 
Proof. A is the composition of the following three maps. 

^{H) ^ i?i([O,l],TA4 0T*A4) E (7.44) 

A i-» (A,S’(A)) (H(A))(A) ^(S’(A))(A)^(s)<is 

Here TM. © T*A4 denotes the fiber bundle over M whose fiber at q E M. 
is TqM X T*M. The first map in this sequence is a C°° map owing to 
Lemma 7.4.1 and Proposition 7.5.3. The second map is, again, a map as 
can be seen by applying Lemma 7.4.3 to the natural pairing map between 
vectors and covectors. Finally, the last map in the sequence is obviously linear 
and, with the help of inequality (7.21), it is easy to check that it is continuous. 
Thus, A is the composition of three C°° maps. □ 

If H and H are two global Hamiltonians for a ray-optical structure J\f 
both of which satisfy the assumptions of Proposition 7.5.1, Proposition 7.5.4 
gives us a C°° action functional on QJ(if) and on QJ(JY). If we identify ^{H) 
and ^{H) in the way outlined above, these two action functionals are easily 
shown to be related in the following way. Prom Proposition 5.1.3 we know 
that H and H satisfy, on their common domain of definition, an equation of 
the form H — FH with a nowhere vanishing function F. If F is positive, the 
action functionals on 2J(H) and 2J(H’) coincide. If F is negative, they differ 
by sign. 

A can be represented with the help of natural coordinates in the form 

— f Pa{s)x°'{s)ds . (7-45) 

Jo 

Here x E H^([0, 1],E“) denotes the coordinate representation of A G ^{H) 
and {x,p) E H^([0, 1],M^“) denotes the coordinate representation of S’(A) in 
the notation of Proposition 7.5.3. In other words, the function s i — > p{s) is 
determined by solving the system of equations (7.27) and (7.28) for c G E"*" 
and p E ([0, 1], E“) . We may use the representation (7.45) even if A cannot 
be covered by a single chart. We just have to read the integrand as an invariant 
function that takes the given form locally in any natural chart. 

Now we view A as a composition of three maps, as in the proof of Propo- 
sition 7.5.4, and we apply Lemma 7.4.3. This shows that the Prechet differen- 
tial {dA)x : T\^{H) — >• E can be derived from (7.45) by calculating, in the 
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usual way of differential calculus, the derivative with respect to a variational 
parameter e under the integral. Denoting this derivative at £ = 0 by ^ we get 



{dA)x{Z) = f (6pa + Pa (s) ds = (7.46) 

Jo 

f (5paC^{x,p)-p^Sx<‘){s)ds + Pa{l)Sx‘{\) -p<.(0) 5x“(0) = 

Jo oPa 

r ( - cfa“ - P. fa») is) ds + p.(l) fo“(l) - p,(0) Sx“(0) 



for Z € T\^{H). Here we have used (7.27) and (7.42). As in (7.41) and 
(7.42), Sx e jH 2([0, 1],M'') represents Z and {5x,5p) € H^([0, 1],R^*^) repre- 
sents TS{Z). It is now easy to prove the desired ^’'-reformulation of Theo- 
rem 7.2.2. 



Theorem 7.5.1. Consider the situation of Proposition 7.5.1. Let Aq^q> de- 
note the restriction of the action functional A, defined in Proposition 7.5.4, 
to the Hilbert manifold fO{H\ q, q') of Proposition 7.5.2. Then X G q, q') 
is a ray if and only if the Prechet differential {dAq^q>)\ is equal to zero. 

Proof For A € ^{H]q,q'), the differential {dAq^q>)\ is equal to zero if and 
only if (dA)x(Z) = 0 for all Z G Tx^(H) with Z(0) = 0 and Z(l) = 0. 
By continuous extension, this is the case if and only if the last line in (7 .46) 
vanishes for all Sx G 1],R’^) with <5a;(0 ) = 0 and <Jx(l) = 0 that repre- 

sent elements Z G Tx^{H), where Tx^{H) denotes the closure of Tx^{H) 
in TaH'^([ 0, 1], A4). Prom the proof of Proposition 7.5.2 we know that those 
Sx are of the form (Ja;“(s) = y“(s) + /(s)i;“(s), where y is an arbitrary el- 
ement of H^{[0, 1],R“) with y{0) = 0 and y{l) = 0 and / is some function 
in H^([0,1],R) with /(O) = /(I) = 0. However, the term proportional to 
the tangent field gives no contribution to the last integral in (7.46) as is 
easily verified with the help of (7.27) and (7.28). Hence, the Prechet differen- 
tial {dAq^q')x is zero if and only if the last integral in (7.46) vanishes for all 
Sx G with Sx{0) = 0 and Sx{l) = 0. Owing to the fundamental 

lemma of variational calculus, generalized into an i?^-setting by continuous 
extension, this is the case if and only if 

Ms) = 

i.e., if and only if the map s i — > (x(s),p(s)) satisfies not only (7.27) and 
(7.28) but all the ray equations. This concludes the proof since, by induction, 
s , — ^ (rc(s),p(s)) must then be an H’’ map for all r G N, i.e., it must be a 
C°° map. ^ 

With this proposition we have achieved our goal of formulating a varia- 
tional principle for rays of a strongly hyperregular ray-optical structure in a 
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Hilbert manifold setting. In a nutshell, the rays from q to q' are the critical 
points of the functional Aq^q> : €3{H;q,q') — ^ M, i.e., the points where this 
functional has a local minimum, a local maximum, or a saddle point. These 
three types of critical points can be distinguished by looking at the second 
derivative of Aq^q>, i.e., at the Hessian of Aq^qf at A, 

HessAA^,^^ : TaQJ(H; q, q') x Tx^{H^ q, q') — . R (7.48) 

Here X and 3^ denote any C°° vector fields (i.e., derivations) on 2J(iJ’;g, g') 
with values X\ and 3 ^a> respectively, at the point A. If {dAq^qi)\ = 0, the 
Hessian is indeed well-defined (i.e., it depends only on the values of X and 
y at the point A), and it gives a symmetric continuous bilinear form on the 
Hilbert space q, q'). 

Clearly, if the Hessian is non-degenerate, it characterizes the critical point 
in the following way. Depending on whether the Hessian is positive definite, 
negative definite, or indefinite, the critical point is a strict local minimum, a 
strict local maximum, or a saddle point. If the Hessian is degenerate, third or 
higher order derivatives have to be considered for characterizing the critical 
point. 

In this connection it is helpful to recall the following standard terminology. 
For a symmetric continuous bilinear form Sj x So — R on a Hilbert space 
So, the index ind(^) is, by definition, the maximal dimension of a subspace of 
So on which ^ is negative definite. The extended index indo(^) is, by definition, 
the maximal dimension of a subspace of So on which is negative semidefinite. 
The nullity null(^) is the dimension of the kernel of Then 

indo(^) = ind(<^) -1- null(<^) (7.49) 

since, by Hilbert space algebra, the orthocomplement of the kernel of ^ can 
be decomposed into two orthogonal subspaces on which # is positive definite 
and negative definite, respectively. 

For a C°° (or, more generally, C^) function on a Hilbert manifold, the 
index of the Hessian at a critical point is called the Morse index of the critical 
point. If the Hessian is non-degenerate at all critical points, the function is 
called a Morse function. Clearly, the Hessian at a critical point A is non- 
degenerate if and only if the extended Morse index of A coincides with the 
Morse index of A. A strict local minimum has vanishing Morse index (but 
not necessarily vanishing extended Morse index). Conversely, if the extended 
Morse index of A vanishes, A is a strict local minimum. 

In terms of natural coordinates the Hessian of Aq^q> at a critical point A 
can be calculated from (7.46) with <5a;(0) = 0 and 5x{l) = 0 by differentiat- 
ing with respect to another variational parameter e'. If we write S' for this 
derivative at e' = 0, we find 
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HeSSA^q,q' 



(Z', Z) = - S'(^pa + c (s) (5x“(s) ds = 



(7.50) 



for Z, Z' € Tx^{H;q,q'). Here {Sx,5p) and {5'x,5'p) denote the coordinate 
representations of TS\z) and TS{Z'), respectively. The first equality in 
(7.50) holds since A is a ray which implies that (7.47) is satisfied. The second 
equality in (7.50) holds since the Hessian is symmetric. 

We shall now give a criterion for HessA-dg^q/ to be non-degenerate. To that 
end we use the notion of Jacobi fields which was introduced, for arbitrary ray- 
optical structures, in Definition 5.6.2. 

Theorem 7.5.2. Let, in the situation of Theorem 7.5.1, A € ^{H\q,q') he 
a ray and Z E T\^{H\q,q'). Then Z is in the kernel of HessA-Ag,g' if and 
only if Z is a Jacobi field along A. 



Proof As in the proof of Theorem 7.5.1, we work in natural coordinates 
for notational convenience. The following argument is valid independently of 
whether or not the considered curve can be covered by a single chart. 

Please recall that the coordinate representation {Sx,Sp) of TE{Z) has 
to satisfy (7.41) and (7.42). If Z is a Jacobi field, (5.46), (5.47) and (5.48) 
must be true. Comparison shows that the function 5k must be equal to the 
constant 5c such that (5.48) takes the form 

<^(pa + c^(x,p))(s) = 0. (7.51) 

Now we can read from (7.50) that Z is in the kernel of the Hessian. Conversely, 
if Z is in the kernel of the Hessian, the last integral in (7.50) vanishes for 
all 5'x that represent elements Z' E VO{H\q,q'). Hence, the last integral in 
(7.50) vanishes for all 5'x E JT^([0, 1],M“) with J'x(O) = 0 and 5'x{l) = 0. 
This follows from the fact that any such 5'x is the coordinate representation 
of some Z' E q, q') up to adding a multiple of the tangent field of A 

which drops out from (7.50) anyway. The same trick was used already in the 
proof of Theorem 7.5.1. Here the situation is even more convenient since A is 
a C°° curve such that its tangent field is, in particular, an JST^ map and not 
only an map. Now the fundamental lemma of variational calculus implies 
that (7.51) has to hold. To complete the proof that Z is a Jacobi field we still 
have to verify that Z is a C°° map. We know that {5x, 5p) E 1],R^“). 

By induction, (7.51) and (7.41) imply that {5x,5p) E jy^([0, 1],R^'‘) for all 
r E N, i.e., that 5x and 5p are, indeed, C°° maps. □ 

In the terminology of Definition 5.6.3, this proposition implies that 
UessxAg^q' is degenerate if and only if q' = A(l) is conjugate to ^ = A(0) 
along A, and that the nullity of the Hessian equals the multiplicity of the 
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conjugate point. Please note that in each Jacobi class along a ray A e 
there is a unique representative J e Tx^{H). 

It is instructive to illustrate these results by specializing to the ray-optical 
structure of Example 5.1.5 where the rays are the geodesics of a (positive 
definite) Riemannian metric g+. For the Hamiltonian given in this example, 
9J(if) is the set of all A € 1], A4) with (A, A) = const, and A is 

the 5+-length functional. In this case RessxAq^gt is (the version of) the 
standard index form of Riemannian geometry and the notions of Jacobi fields 
and of conjugate points are the familiar textbook ones, see e.g. Bishop and 
Crittenden [15], Chap. 11. 

Similarly, specializing to the ray-optical structure of Example 5.1.2 gives 
the analogous results for time-like geodesics of a Lorentzian metric which 
should be compared with Beem, Ehrlich and Easley [11], Sect. 10.1. Note that 
for the Hamiltonian H given in Example 5.1.2 A is the negative Lorentzian 
length functional on time-like curves. Switching from H to —H yields the 
positive Lorentzian length functional instead. 

Now we want to relate the Morse index of a critical point A of Aq^q/ to the 
number of conjugate points along A, thereby generalizing the classical Morse 
index theorem for Riemannian geodesics. Partly as a preparation for that 
we prove the following criterion for a critical point to be a minimum. This 
criterion applies to rays that are associated with a classical solution of the 
eikonal equation. (Please recall Sect. 5.5.) It generalizes a classical theorem of 
variational calculus, based on the socalled Weierstrass excess function, into 
our setting of ray-optical structures. In the language of traditional variational 
calculus, rays associated with a classical solution of the eikonal equation are 
usually characterized as being “embedded in a field of extremals” . 

Theorem 7.5.3. Let, in the situation of Theorem 7.5.1, A© € 
be a ray. Assume that there is a classical solution S:U C M — >• M of the 
eikonal equation H o dS = 0 such that the lifted ray E’(Ao) is completely 
contained in dS{U), where S denotes the lifting map of (7.40). Then the 
following holds true. 

(a) If, for the Hamiltonian H under consideration, the matrix on the left- 
hand side of (5.15) is not only non-degenerate but even positive definite 
along H’(Ao) {in one and thus in any natural chart), then Xo is a strict 
local minimum of Aq^qf. 

(b) If the Hamiltonian H : W — M is defined on a domain W C T*M such 
that yV n T*M is convex for all q € M, if the matrix (d'^H/dpadpb) 
is positive definite at all points u eW {in one and thus in any natural 
chart), and if the domain U of S covers all curves A G ^{H',q,q'), then 
Xo is a strict global minimum of Aq^q>. 

Proof. We use the coordinate representation (7.45) of A and its restriction 
Aq^q>. As in the proof of Theorem 7.5.1 and Theorem 7.5.2, coordinates are 
employed for notational convenience only. The following argument remains 
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valid even if U cannot be covered by a single chart. Then we find for all 
A G q, q') contained in U 



Aq^q,{\)- Aq^q>{\o)= f {pa (s) ds - [ (daS{Xo) X^){s) ds . (7.52) 

Jo Jo 



Here {x,p) is the coordinate representation of S{\) whereas {xo,dS{xo)) is 
the coordinate representation of H(Ao). As x(0) = a;o(0) and a;(l) = a;o(l), 
we can replace Xq by x in the second integral. With (7.27) this puts (7.52) 
into the form 



Aq,q>{X)~Aq,q>(Xo) = J^ (^(pa ~ OaS (x)) C ^{x , p^^is) ds . ' (7.53) 



If A is close to Ao, p{s) is close to dS{x{s)). In that case, as if is a C°° and 
thus function defined on an open neighborhood of Taylor’s theorem 
implies 

H(x{s),dS{x{s))'^ = (7-54) 

H{x{s),p{s)) + ^(x{s),p{s)^ (daS{x{s)) -Pa{s)^ + 

^ ^^-^(a:(s),p'(s))(^a*5(a:(s)) -Pa(s)) (^6<S'(a;(s)) -pb{s)^ 

for some Pa{s) = Pa{s) + 9{s) (daS[x{s)) — Pa(s)^ with 0 < 9{s) < 1. The 
left-hand side of (7.54) vanishes since 5 is a classical solution of the eikonal 
equation. The first term on the right-hand side vanishes since all curves in 
Q3(HT) satisfy (7.28). Hence, inserting (7.54) into (7.53) results in 

■^q,q'W ~ = (7.55) 

for A sufficiently close to Aq. Now the positive definiteness assumption of part 
(a) implies that the matrix {d^H/dpadpb){x{s),p{s)) is positive definite on 
vertical vectors tangent to A7. By continuity, for A sufficiently close to Xq, the 
integrand in (7.55) is strictly positive unless Pa{s) — daS[x{s)). The latter 
equation holds for all s G [0, 1] if and only if A = Ao since A satisfies the 
same boundary conditions and the same parametrization fixing condition as 
Aq. This completes the proof of part (a). 

If the convexity assumption of part (b) is satisfied, we can use (7.54) 
even if A is not close to Xq. Thus, under the assumptions of part (b), (7.55) 
is valid for all A G V0{H',q,q'), and the integrand is strictly positive unless 
p„(s) = daS{x{s)). From the proof of part (a) we know already that the 
latter equation holds for all s G [0, 1] if and only if A = Aq. n 
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Prom Proposition 5.5.5 we know that for a sufficiently short ray Xo we 
can always find a classical solution S of the eikonal equation such that 3’(Ao) 
is contained in the image of dS. Hence, under the positive definiteness as- 
sumption of Theorem 7.5.3 (a) a sufficiently short ray always gives a strict 
local minimum of the respective functional Aq^q>. This positive definiteness 
assumption is satisfied, e.g., for the Hamiltonian of Example 5.1.5 which 
gives the geodesics of a (positive definite) Riemannian metric, but also for 
the Hamiltonian of Example 5.1.2 which gives the time-like geodesics of a 
Lorentzian metric. Please note that we are always free to change H into 
—H, thereby inverting the sign of the functional A and turning minima into 
maxima and vice versa. 

We are now ready to prove a generalized Morse index theorem. 

Theorem 7.5.4. (Morse index theorem) Let, in the situation of The- 
orem 7.5.1, A € he a ray and assume that, for the Hamiltonian 

considered, the matrix on the left-hand side of (5.15) is positive definite at 
all points of S(X), in one and hence in any natural chart. Then the extended 
Morse index of X satisfies the equality 

indo(HessA^g,gO = E "(«) ■ (7-56) 

S 

Here the sum is to he taken over all s € ]0, 1] such that X{s) is conjugate to 
A(0) along X, and n(s) denotes the multiplicity of this conjugate point. {Note 
that, hy Proposition 5.6.3 this sum is finite.) 

Proof. Let Tx^{H-,q,q') be the closure of the tangent space Tx^{H;q,q') 
in TAi7^([0, 1], Ad), and let U.essxAq^q> be the continuous extension of the 
Hessian onto this space. To verify that this extension exists, we recall that 
the Hessian is given, in terms of natural coordinates, by (7.50). (Again, co- 
ordinates are used for notational convenience only. The following argument 
remains true even if A cannot be covered by a single chart.) If we shift the 
derivative from 6'pa to 5x°‘ by means of a partial integration, we get a man- 
ifestly H^ continuous expression. Thus, the extended Hessian is given by 

RessxAq^q>{Z',Z) = j (^'paSx^^ ~5'{c^^{x,p)^5x^'^{s)ds. (7.57) 

For each s € ]0, 1] we define a map A®: [0,1] — M. by A* (s') = A(s's). 
Clearly, A* is a critical point of Aq^x{s)- To ease notation, we write 

S)s =Tx‘>^{H\q,X{s)) and =H.essx^Aq^x{a) ■ (7-58) 

If we choose C°° vector fields Ei,...,En-i along A such that the vectors 
Ei(s), . . . , jE?h_i(s), A(s) are linearly independent for each s € [0, 1], then the 
Hilbert space Sjs can be identified with the Hilbert space 
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^ = {Z € I Z(0) = Z{1) = 0} (7.59) 

for each s € ]0, 1]. Viewed in this sense as a one-parameter family of sym- 
metric bilinear forms x M on a single Hilbert space, depends 

continuously on s in the weak sense, i.e., the map s — ^ Z) is continu- 

ous for all Z e Sj. This follows from the behavior of the integral (7.57) under 
parameter transformations. We now introduce the notation 



i{s) = ind(^a) = ind(HessA.Aq,A(s)) » 
io(s) = indo(^s) = indo(HessxsAq,A(s)) , (7.60) 

n(s) = null(^s) = null(HessxaAq,x(s)) > 



where in each line the first equality is a definition and the second equality 
holds since the process of continuous extension leaves index, extended index 
and nullity unchanged. Note that, by Theorem 7.5.2, n(s) is different from 
zero if and only if A(s) is conjugate to A(0) along A and that it gives the mul- 
tiplicity of this conjugate point. Thus, by Proposition 5.6.3, n{s) is different 
from zero only at finitely many points s € ]0, 1], and at each of those points 
it takes a finite value, see Figure 7.5. We shall now discuss the behavior of 
the functions i and io‘. ]0, 1] — > N§°, where Ng° denotes the nonnegative 
integers including infinity. To that end we define, for 0 < s < s' < 1, a map 
Hs,s > : S)a — ^ Sjs' by 



{i^s,s'(Z)){s") == Z{2s") 

(K.y(Z))(s") = 0 



for 0 < s" < 5 , 

for i < s" < 1 . 
2 



(7.61) 



Note that this map is, indeed, well defined. (This construction does not 
work for curves. Therefore, the extension to curves was necessary.) 
Clearly, Kg^s' is linear, continuous, and injective (but not surjective, of course). 
Moreover, with the help of (7.57) we find that ^g/(Z',Z') = ^s{Z,Z) for 
Z' = Ks^s' {Z ) . Hence, if is negative definite (or negative semidefinite, re- 
spectively) on a certain subspace of S)s, then this subspace is mapped by 
Ks,s' onto a space of the same dimension on which ^a' is negative definite (or 
negative semidefinite, respectively). This implies that the functions i and io 
are monotonic, i.e., 

i(s) < i{s') and io{s) < io{s') for 0 < s < s' < 1 . (7.62) 



We now fix a parameter value s € ]0, 1] and decompose the Hilbert space 
Sja=^ into orthogonal subspaces 0.^3°, where is the kernel 

of and ^a is positive definite on and negative definite on i 3 “. Since ^a 
depends continuously on s in the way outlined above, ^a+e is still positive 
definite on and negative definite on for le| sufficiently small, i.e.. 
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. n(s) 




. *(s) 


. io(s) 


• 









Pig. 7.5. The left-continuous function i(s) and the right-continuous function io{s) 
have jumps at those isolated points where the nullity n(s) is different from zero, 
see the proof of Theorem 7.5.4. 



i(s) < i(s 4- s) and io(s) > io(s + e) (7.63) 

for \e\ sufficiently small. (If s = 1, £ must, of course, be negative. Otherwise, 
(7.63) holds for positive and negative £.) Prom (7.62) and (7.63) we can 
determine the behavior of i and io in the following way, see Figure 7.5. On 
each open interval on which n is equal to zero, i and io = i+n coincide. (7.63) 
shows that i = io must be constant on such an interval. Moreover, (7.63) 
and (7.62) imply that i is left-continuous whereas io is right-continuous, i.e., 
i(s - 0) = i(s) and io{s + 0) = io{s). Thus, at each of the finitely many points 
s where n(s) ^ 0, the function io jumps by an amount of io{s) — i{s) — n{s). 
This gives the equality 



io(l) = io{e) + n(s) (7.64) 

S 

where e must be so small that n vanishes on the interval ]0,e]. Prom Propo- 
sition 5.5.5 we know that a sufficiently short ray can be associated with a 
classical solution of the eikonal equation. We can thus apply Theorem 7.5.3 
(a) to the ray A^. This shows that io(e) = i(e) must be equal to zero. Together 
with (7.64), this proves the desired result. □ 

Specialized to the ray-optical structure of Example 5.1.5, where the rays 
are the geodesics of a Riemannian metric. Theorem 7.5.4 reproduces the 
classical Morse index theorem for Riemannian geodesics. As a matter of fact, 
our proof of Theorem 7.5.4 followed the proof of the classical Morse index 
theorem, as it is given, e.g., in Bishop and Crittenden [15], Chap. 11, as 
closely as possible. 

Specialized to the ray-optical structure of Example 5.1.2, where the rays 
are the time-like geodesics of a Lorentzian metric. Theorem 7.5.4 reproduces 
the Morse index theorem for time-like geodesics, cf. Beem, Ehrlich, and Easley 
[11], Sect. 10.1. 




7.5 A Morse theory for strongly hyperregular ray-optical structures 181 

For applications of the Morse index theorem one usually restricts to the 
case that A(l) is not conjugate to A(0) along A, i.e., that the Hessian is 
non-degenerate. In this case (7.56) has the following consequences. 

A is free of conjugate points if and only if A is a local minimum of Aq^qf . 
There is a point A(s) conjugate to A(0) along A, for some s € ]0, 1[, if and 
only if A is a saddle-point of Aq^q>. Maxima cannot occur since, by Proposi- 
tion 5.6.3, the right-hand side of (7.56) is finite. 




8. Applications 



In this chapter we illustrate our results with examples and indicate some 
applications to astrophysics and astronomy. In the beginni n g we show how 
our formalism can be used to reobtain some standard textbook results. Later 
we are going to give some more sophisticated applications. 



8.1 Doppler effect, aberration, and drag effect 
in isotropic media 

For a light ray passing through a medium, a moving observer will register 
(a) a different frequency, (b) a different spatial direction and (c) a different 
velocity in comparison to an observer who is at rest with respect to the 
medium. It is our goal to calculate the respective formulae for an isotropic 
medium, thereby determining (a) the Doppler effect, (b) the aberration and 
(c) the drag effect in such a medium. We perform these calculations on an 
arbitrary Lorentzian spacetime manifold for a medium in arbitrary motion. 
However, in essence this is an exercise in special relativity since only algebraic 
calculations on tangent spaces are involved. 

According to Sect. 6.3, light propagation in an isotropic medium on a 
Lorentzian spacetime manifold {M,g) is given by a Hamiltonian in terms of 
an optical metric go, 



H{x,p) = ^gf{x)paPb = 



( 8 . 1 ) 



Here the g°'^ are the contravariant components of the spacetime metric g, 
the 17°' are the components of a vector field U on M with gab U°U^ = —1 
that gives the rest system of the medium, and the function n : M — > [1, oo[ is 
the index of refraction. By assuming that n is bounded by 1 and independent 
of the firequency —U°pa we restrict to ray-optical structures that are causal 
and dilation-invariant. The latter restriction means that the following results 
apply to non-dispersive media only. At the end of this section we shall briefly 
comment on the dispersive case. 
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Fig. 8.1. u is the spatial velocity of the medium in the reference frame V and — v 
is the spatial velocity of the V-observers in the rest system of the medium. 



We now consider another vector field V with gab V°’V^ = —1, The relative 
velocity of the observer field V with respect to the observer field U is given 
by a function P : M — > [0, 1 [ defined by 

Here we assume that U and V point into the same half of the ^r-cone. Then the 
normalization conditions on U and V imply that, indeed, gab 17“ V'*' < —1; so 
P is well-defined. With the help of this function P we introduce vector fields 
u and 0 via 

u“ = x/l - p^ C/“ - and tj“ = C/“ - ^/l - p^ , (8.3) 

which obviously satisfy 

gab^“V'‘=g,i,o°U'’ = 0, 

u is the spatial velocity vector field of the medium in the reference frame V, 
whereas — b is the spatial velocity vector field of the V-observers in the rest 
system of the medium, see Figure 8.1. 

At each point of a light ray its momentum can be decomposed, with 
respect to the observer field V and with respect to the observer field U, into 
frequency and spatial wave covector according to (6.8) and (6.9), 

= -Pa y and ka= Pa 9ab , (8.5) 

a>* = -p„i7“ and QabU’’ . (8.6) 

Here and in the following quantities in the rest system are marked with an 
asterisk. In terms of these quantities the dispersion relation go^PaPb — 0 can 
be written in either of the two following forms. 
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(1 - ka kb - - (n^ - l){ka = 0 , (8.7) 

= 0 ( 8 . 8 ) 

Moreover, a quick calculation shows that 

kaU^ = uj - UJ* , (8.9) 

k* uj-uj*. (8.10) 



The spatial direction of a ray is determined by its ray velocity (6.11) which 
can be calculated with respect to the observer field V and with respect to 
the observer field U. With the Hamiltonian (8.1) we find 



V = — 



gfpb 



V^gcdOi^Pe 

gfpb 

U^gcdgi^Pe 



-F“, 



-U°- 



( 8 . 11 ) 

( 8 . 12 ) 



If we use the dispersion relation, a straight-forward calculation puts (8.11) 
and (8.12) into the forms 



y/1 — + (n^ - 1) w* 



V = 



g^^kl 



(8.13) 

(8.14) 



This shows that the ray velocity is not parallel to the spatial wave vector 
unless in the rest system of the medium. (In the vacuum case n = 1, every 
observer field can be viewed as the rest system.) To characterize the spatial 
direction of the ray we introduce angles 6 and 9* via 



gab = Vpab COS 9 , (8.15) 

gab V*"" = Voab v*^y/gab^°' cos 9 * . (8.16) 



We are now ready to derive the desired results. 

(a) Doppler effect 

After our preparations, the Doppler formula is easily derived by inserting 
(8.14) into (8.16). With the help of (8.8) and (8.10) this results in 

^l + npcos9* 

U = U , ■ . (8.irj 

In the vacuum case n = 1, (8.17) reduces to the standard Doppler formula 
which is given in any textbook on special relativity for the case that U and 
V are inertial systems on Minkowski space. Our argument proves the (rather 
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trivial) fact that, pointwise, the same formula holds for observers in arbitrary 
motion on an arbitrary Lorentzian spacetime manifold. Prom (8.17) we read 
that the transverse Doppler ejffect {6* = tt/ 2) is unaffected by n. This reflects 
the well-known fact that the transverse Doppler effect is caused by time 
dilation alone. Linearization of the relativistic Doppler formula with respect 
to (3 yields the classical Doppler formula. The quadratic corrections to this 
formula were verified for the first time by Ives and Stilwell [65] in a laboratory 
experiment with canal rays, cf., e.g., French [44], Sect. 5.7. 

The Doppler formula (8.17) should not be confused with the redshift for- 
mula (6.23). Contrary to (6.23), (8.17) compares two frequencies at the same 
point with respect to two different observers. Whenever frequency measure- 
ments at two different points are to be compared one should use the redshift 
formula (6.23). The latter is of paramount importance in cosmology where 
the influence of a medium is usually considered to be negligible. The redshift 
formula in a medium has some relevance in view of precision experiments 
with so-called microwave links in our Solar system, see, e.g., Bertotti [13]. 
In these experiments, microwaves are exchanged between two spacecrafts or 
between a spacecraft and the Earth, and the emitted and received frequencies 
are measured with a relative accuracy of or 10“^®. Owing to this high 
accuracy, the influence of the interplanetary medium (or, for signals grazing 
the Sun, of the Solar corona) on the frequency shift is very well measurable 
in experiments of this kind. It is true that such frequency measurements with 
microwave links are usually called “Doppler measurements”; nonetheless, it 
is not the Doppler formula (8.17) but rather the redshift formula (6.23) which 
provides a theoretical basis for those measurements. 

As a typical application of the Doppler formula (8.17) to astronomy we 
consider an inertial system V on Minkowski space and we assume, as an 
idealization, that our galaxy is at rest with respect to V in the temporal 
average. We assume that the worldline of the Earth is an integral curve of 
U. Then, along the worldline of the Earth, the function /? defined by (8.2) 
gives the velocity of the Earth relative to V in units of the vacuum velocity 
of light. This is mainly determined by the orbital motion of the Solar system 
around the center of our galaxy, with smaller corrections coming (i) from the 
peculiar motion of the Solar system, (ii) from the yearly rotation of the Earth 
around the Sun, and (iii) from the daily rotation of the Earth. This orbital 
motion takes place with a velocity of about /? = 0.00083 which corresponds 
to 250 km/s in conventional units. With 9* = tt and n = 1 (8.17) yields 
LV* = 1.00083 oj. Thus, an observer on the Earth sees starlight coming to us 
( “head-on” ) from the apex of the Solar motion blueshifted by about 0.083 % 
in comparison to a fictitious observer at the same place who is at rest with 
respect to our galaxy. For light coming to us at 0* = f , the Doppler effect 
is purely transverse and yields a tiny redshift of only 0.00003 %. Since these 
calculations were done with n = 1, the influence of our atmosphere was 
ignored. In the optical regime the atmosphere can be treated as an isotropic 
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non-dispersive medium with n = 1.0003. To within the given accuracy, this 
leaves the above results unchanged. When using the Doppler formula (8.17) 
in situations like that with n 7 ^ 1 it is important to keep in mind that not 
only the observed frequency u* but also the fiducial firequency a; is to be 
measured in the medium. 



(b) Aberration 

We now turn to the derivation of the aberration formula by inserting (8.13) 
into (8.15). After some algebraic manipulations using (8.7), (8.9), and (8.17), 
we find 



cos 6* + nP 

'(n + (3 cos6*Y “ (1 ~ — 1 ) 



Please note that, by (8.15) and (8.16), 6 and 9* are defined in terms of 
the ray velocity ( = group velocity) and not in terms of the phase velocity. 
Thus, (8.18) gives the aberration of rays, as it is measured with an ordinary 
telescope, and not the aberration of wave surfaces, as it is measured with 
adaptive optics devices. This makes a difference since, as long as n 7 ^ 1, the 
direction of the ray velocity does not coincide with the direction of the phase 
velocity. 

Setting n = 1 in (8.18) yields the standard aberration formula for vacuum 
which is given in any textbook on special relativity for the case that U and 
V are inertial systems on Minkowski space. With p = 9.92 • 10“^ (orbital 
velocity of the Earth around the Sun, in units of the vacuum velocity of 
light) and 0 = f (light coming from a star S at the pole of the ecliptic), this 
vacuum aberration formula yields cos 9* = —0.000099. Hence, at the celestial 
sphere of an observer on the Earth the star S performs a yearly circular 
motion with radius 20.5" around the pole of the ecliptic. By an analogous 
calculation, a star which is not at the pole of the ecliptic performs a yearly 
elliptical motion with major semi-axis 20.5". This effect was observed already 
in 1728 by Bradley. 

It was found by Airy in the 19th century that the aberrational ellipses are 
unchanged if they are measured with a telescope filled with water (n = 1.5) 
rather than with air (n = 1), cf., e.g., Preston [121], p. 538. At first sight, 
this result seems to be at variance with (8.18). However, (8.18) only says that 
the relation between 9 and 9* depends on n. A deeper analysis shows that, if 
the telescope is filled with water rather than with air, the observed angle 9* 
remains unaffected whereas the fiducial angle 9 changes. The situation is quite 
analogous to the Doppler effect. Both the Doppler formula and the aberration 
formula give a relation between two quantities measured by different observers 
at the same place in the same medium. 



(c) Drag effect 

Our next goal is to visualize the dependence of the ray velocity on the spatial 
direction. By (8.11), the dispersion relation Qo^PaPb = 0 implies 
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{goU(v'‘ + V'‘)(v'’ + V'’) = 0. (8.19) 

Here the 

(9o)ab = n^Qab + (n^ - 1) 9ac U"" 9bd (8.20) 



are the covariant components of the optical metric, {go)ab 9^o^ = ^o- After 
some algebraic manipulations (8.19) takes the form 

(1 - 0^) {gab - 1) + (n^ - 1) {gab v** - 1)^ = 0 . (8.21) 



This equation demonstrates that the indicatrix (6.14) of a non-dispersive 
isotropic medium with respect to an arbitrary observer field V is an ellipsoid, 
see Figure 8.2. Our causality assumption n > 1 implies that this ellipsoid is 
completely contained in the vacuum light sphere, gab < 1. If we pass to 
the rest system, the indicatrix turns into a sphere, 

= ( 8 . 22 ) 



For the sake of completeness we also calculate the figuratrix (6.13) to visualize 
the dependence of the phase velocity on spatial directions. The definition 
(6.10) of the phase velocity implies that 



1 . ^ 

Wa Wb 

With (8.23), the dispersion relation (8.7) yields 



(1 - (3^) g^^ WaWb{l - Wa Wb) = 
(n^ - 1) (^“^ Wa Wb - Wa)"^ . 



(8.23) 



(8.24) 



The figuratrix is, thus, a fourth order surface, see Figure 8.2. Our assumption 
n > 1 guarantees that g°'^WaWb < 1, i.e., not only the ray velocity but also 
the phase velocity is bounded by the vacuum veclocity of light. If we pass to 
the rest system of the medium, the figuratrix turns into a sphere, 

g°-^wlwl = -^. (8.25) 



Hence, for the rest system figuratrix and indicatrix coincide if we identify 
tangent space and cotangent space in the usual way with the help of the 
spacetime metric. 

For rays parallel or ant iparallel to the relative motio n we can use the 
relations gab = ±j3 yj gcd and tfo u“ = ^(3 0g^^ WcWd- In this 

situation (8.21) and (8.24) imply 



09ab = Vg^^WaWb = 



1 T i 



(8.26) 
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/3 = 0 0 </ 5 <^ 









Pig. 8.2. This picture shows the figuratrix (top) and the indicatrix (bottom) for 
different values of the observer’s velocity in an isotropic medium with index of 
refraction n > 1. The vertical axis is chosen parallel to the observer’s velocity 
relative to the medium. Analytically, the figuratrix is given by the fourth order 
equation (8.24) whereas the indicatrix is an ellipsoid given by (8.21). The dashed 
circle indicates the vacuum fight sphere, i.e., figuratrix and indicatrix for n = 1. 
Please note that for ^ < ^9 < 1 the observer’s velocity exceeds the velocity of fight 
in the medium whereas it is still limited by the vacumn velocity of fight. In each 
case the intersections with the vertical axis are determined by (8.26). 



Note that, by (8.22) and (8.25), 1/n is the absolute value of the ray 
velocity and of the phase velocity in the rest system of the medium. Hence, 
(8.26) says that, in the relative direction of motion, both the ray velocity and 
the phase velocity obey the familiar relativistic addition theorem for spatial 
velocities. For the phase velocity, this result can be tested by interference 
experiments with light propagating through moving fluids. Such experiments 
have been performed by Fizeau in 1851 who verifled the equation 

= i q: /3 (1 - (8,27) 

which was heuristically suggested already earlier by Fresnel. Obviously, (8.27) 
follows from (8.26) by neglecting quadratic and higher order terms in /3. On 
the basis of 19th century physics, the factor (1 — -^) in (8.27) was hard to 
understand. If light propagates in an ether, and if spatial velocities are to 
be added in the Newtonian way, then (8.27) seems to suggest that the ether 
is “partially dragged along” by the medium. If we stick to this outdated 
terminology. Figure 8.2 illustrates the drag effect in an isotropic medium for 
all spatial directions. 

We end this discussion with a quick remark on generalizations to disper- 
sive isotropic media. From Sect. 6.3 we know that then the Hamiltonian (8.1) 
is still valid, but now n is not only a function of the spacetime point but also 
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of the frequency u* = —U°'Pa- It is easy to check that this generalization 
leaves the Doppler formula (8.17) unchanged, whereas in the aberration for- 
mula (8.18) n has to be replaced by n -f a;* dn/duj* everjnvhere. For those 
frequencies for which oj* dn/du* is small compared to n, (8.18) is still a valid 
approximation. As long as the function cj* i — >■ n{u*) has not been specified, 
nothing can be said about the form of indicatrix and figuratrix with respect 
to an arbitrary observer field. In the rest system, indicatrix and figuratrix 
are spheres as in the non-dispersive case, but the radius of either sphere now 
depends on the frequency. 



8.2 Light rays in a uniformly accelerated medium 
on Minkowski space 

As in the preceding section we consider an isotropic medium, i.e., a Hamilto- 
nian of the form (8.1). This time we specialize to the case that the spacetime 
metric is the Minkowski metric, 

g = {dx^Y + + {dx^Y ” > (8.28) 

and we restrict our considerations to the subset 

= I (a;^,rc^,a;^,a:^) € | (x^)^ > (x‘*)^ } (8.29) 

of Minkowski space. The index of refraction is supposed to be a constant 
n > 1 and the medium is supposed to be in uniformly accelerated motion. 



U = 



V(X3)2 4. ( 2 , 4)2 



4 5 3 



d 

dx^ 



(8.30) 



see Figure 8.3. The integral curves of this vector field are known as Rindler 
observers, and Ai is known as the Rindler wedge, cf. Rindler [123], Sect. 8.6. 
For the calculation of light rays in this medium it is convenient to introduce 
new coordinates (x, y, z, t) via 



x^ = X , 

x^ = y, 

= z cosht , 
x^ — z sinh t . 



(8.31) 



The momenta transform according to 
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Fig. 8.3. The Rindler observer field U occupies a wedge-shaped region of Minkowski 
space. 



Pi — Px J 
P2=Py, 



Pt 

P3 — Pz cosh t — — sinh t , 
Pt 

P 4 = -p^ sinh t H cosht . 

% 



In the new coordinates, the Minkowski metric reads 
g = dx^ -f dy^ + dz^ - dt^ , 



the Rindler wedge is represented as 

M = {{x,y,z,t) € 1 z > 0}, 

and the observer field (8.30) takes the simple form 



(8.32) 



(8.33) 



(8.34) 



(8.35) 



Inserting into (8.1) yields the Hamiltonian 



H{x,y,z,t,Px,Py,Pz,Pt) = ^ ( Px+Py+P z 




(8.36) 



As the coordinate t does not appear in the Hamiltonian H, the dispersion 
relation H - 0 determines a stationary ray-optical structure in the sense of 
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Definition 6.5.1, W = d/dt G Q^. (As W is orthogonal to the hypersurfaces 
t = const., this ray-optical structure is even globally static.) By Proposi- 
tion 6.2.2, this implies that the function f = In z is a redshift potential for 
the observer field (8.35). In other words, the redshift under which a Rindler 
observer at z — zi is seen by a Rindler observer at z = Z 2 is given by the 
formula 

— = - , 8.37 

(Ji Z2 ^ ' 

cf. equation (6.26). This result is true for any (constant) value of the index 
of refraction n. 

With the global timing function t and any real constant Uo 0, all 
assumptions of the reduction theorem (i.e., of Theorem 6.5.1) are satisfied. 
This gives us a reduced ray-optical structure jv for each lUo on 



M = {{x,y,z) > 0 } . 



(8.38) 



By (6.78), we find a Hamiltonian H for this reduced ray-optical structure 
simply by setting pt equal to —Uq in (8.36), 

^ 1 

H{x,y,z,p^,py,p,) = —{pI+pI+pI) - ^ . (8.39) 

Hence the dispersion relation of takes the form 

9^"' PuPu = (8.40) 

where the are the contravariant components of the Riemannian metric 

g = + . (8.41) 



The Riemannian manifold {M^g) is the socalled Poincare half-space which 
is dicussed in many textbooks on differential geometry, see, e.g., Thorpe [143], 
p. 236 and p. 242. 

We have thus shown that the rays of Tv coincide with the geodesics of 
the Poincare half-space. It is well known and easily verified that the latter 
are all those half-circles in M that meet the surface z = 0 orthogonally, 
see Figure 8.4. Please note that the rays of are independent of the 
(constant) index of refraction n. They are, of course, also independent of uJq 
which reflects the fact that our medium is non-dispersive. 

This calculation exemplifles our findings of Sect. 6.6. There we have seen 
that the rays of a reduced ray-optical structure are the geodesics of a Rie- 
mannian metric g whenever the following two properties are satisfied. The 
stationary ray-optical structure to which the reduction formalism is applied 
must be given as the null cone of a Lorentzian metric go, and the time-like 
vector field W e Gm must be hypersurface-orthogonal with respect to gg. In 
particular, the optical path length is then given as the ^-length which implies 
that Fermat’s principle reduces to the geodesic variational problem for the 
metric g. 
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Fig. 8.4. The rays of are the geodesics of the Poincare half-space which are 
half-circles. 



8.3 Light propagation in a plasma on Kerr spacetime 

If we consider the plasma model of Chap. 3, light rays propagating in a non- 
magnetized plasma on an arbitrary Lorentzian spacetime manifold (M, g) are 
determined by a Hamiltonian H of the form 

H{x,p) = l {g''^{x)paPb-\-i^p{xf) • ( 8 - 42 ) 

Here the are the contravariant components of the spacetime metric g and 
the spacetime function LUp is the ’’plasma frequency” which is determined by 
the electron density of the plasma according to (3.51). More precisely, we have 
seen in Chap. 3 that our plasma model gives a dispersion relation with three 
branches, determined by three Hamiltonians (3.44), (3.45) and (3.46), and 
that only the third Hamiltonian, which is of the form (8.42) , is associated with 
light rays passing through the plasma. If the plasma frequency has no zeros 
(i.e., if the plasma covers the whole spacetime region under consideration), 
the ray-optical structure determined by the Hamiltonian (8.42) is of the kind 
considered in Example 5.1.2. 

In this section we want to discuss the rays of this ray-optical structure for 
the special case that the underlying Lorentzian manifold (M,p) is the Kerr 
spacetime. In Boyer-Lindquist coordinates the Kerr metric reads 

g = -\- 

(r^ -1- a^) sin^ d dUp^ - dt^ (a sin^ - dtf ' , 



(8.43) 
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where cost'd and A = - 2mr + a^, see, e.g., Hawking and 

Ellis [59], Sect. 5.6. We assume that the real constants m and a satisfy the 
conditions m > 0 and <m^. Then the Kerr metric mathematically models 
the spacetime region around a rotating (but uncharged) black hole with mass 
m and angular momentum ma. In the region where r is large enough it also 
gives a valid approximation for the spacetime around a rotating star. For 
a — 0 the Kerr metric reduces to the Schwarzschild metric which models the 
spacetime region around any spherically symmetric massive body. 

Prom (8.43) we can calculate the contravariant components of the 
metric. This puts the Hamiltonian (8.42) into the form 






^Pr+Pl 

2p2 



p^ — 2m r / 2mra sin^t? \ ^ 
2Ap^ sin^ 2 ? V — 2 m r / 



p'^Pt I 
2p‘^-4:mr 2 ' 



(8.44) 



It should be noted that the Kerr metric is a vacuum solution of Einstein’s 
field equation. Hence, the use of the Hamiltonian (8.44) is physically justified 
as long as the gravitational field produced by the plasma can be neglected. 

In the following we restrict to the region where the vector field d/dt is 
time-like, i.e.. 



r > m 



-f V cos 1 ? , 



(8.45) 



This is the region outside the socalled ergosphere. Moreover, we assume that 
the plasma frequency is independent of t whereas it may depend arbitrarily 
on r, <p and id. In other words, we assume that the electron density of the 
plasma is stationary. Please note that, for our plasma model, the velocity of 
the plasma has no influence on the light rays and can therefore be arbitrary. 
Under these assumptions the vector field W = d/dt generates a time-like 
symmetry, i.e., our ray-optical structure is stationary in the sense of Defini- 
tion 6.5.1, and the coordinate function it is a timing function in the sense of 
Definition 6.5.2. 

We want to carry through the reduction process of Theorem 6.5.1 in order 
to dicuss the spatial paths of light rays. To that end we have to choose a real 
number a;© ^ 0 for the frequency and we have to restrict to the region where 
the inequality 



< 






— 2 m r ° 



(8.46) 



is satisfied. It can be read directly from (8.44) that a ray with pt = —oJo 
cannot exist outside this region. It is then easy to check that all hypotheses 
of Theorem 6.5.1 are satisfied, i.e., that we get a reduced ray-optical struc- 
ture on the 3-manifold M determined by the inequalities (8.45) and 
(8.46). We get a Hamiltonian for this reduced ray-optical structure simply by 
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replacing the conserved momentum coordinate pt in (8.44) by the constant 
— Wo» please cf. (6.78). As always, we are free to multiply this Hamiltonian 
with an arbitrary nowhere vanishing function. This shows that 

H{r,'d,cp,pr,p^,p^) = (8.47) 



1 




f„ 1 2 mr a 

[Ptp -r pi-2mr j 




2 Wo 


\ 


^p^—2mr j 


1 

J 



is a Hamiltonian for the reduced ray-optical structure . This Hamiltonian 
H is of the form (6.103), with the Riemannian metric g and the one-form ^ 
given by 



9 = P 





dr^ ,n2 Asm^i? 

-T + + -2 — 

A — 2 mr 



2mrasm^d 
— 2m r 



d(f . 




(8.48) 

(8.49) 



The lifted rays oiAfujo are, thus, determined by (6.104), (6.105), and (6.106), 
whereas the rays are determined by (6.107) and (6.108). In analogy to (6.111), 
the optical path length takes the form 




By Fermat’s principle, the light rays of frequency oJo between any two points 
in M are the extremals of the functional I. In the Schwarschild case a = 0, 
the rays are exactly the ^-geodesics, otherwise they are modified by a kind of 
Coriolis force. Contrary to the situation considered in (6.111), here the metric 
g depends on the frequency uJo^ thereby reflecting the fact that our plasma is 
a dispersive medium. By the same token, the optical path length functional 
(8.3) does not give the travel time with respect to the timing function t, 
unless in the vacuum case Up = 0. Please note that the limit Uo oo leads 
to the same result as setting the function Up equal to zero; hence, in the limit 
of infinite frequency the rays approach the vacuum rays. 

In the following we want to use these general results to calculate the total 
angular deflection of light rays in the equatorial plane d = tt/ 2. Prom now 
on we assume that the plasma frequency Wp is a function of r alone, i.e., 
that the electron density of the plasma is rotationally symmetric. Then the 
^-component of (6.105) takes the form 
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ip = 



(r-2m)fe + ^) 

r(r2-2mr + o2) 



(8.51) 



and the ^-component of (6.106) says that p^p is a constant of motion. On the 
other hand, (6.107) yields 



/ r a;p(r)^\ / (r^ — 2mr + a^) r<p^ 

\r — 2m o;2 y I — 2mr 4- ^ r — 2m 

(8.52) 

Upon dividing (8.3) by cp^ and using (8.51) on the right-hand side we find 

~ 2mr-fa^) 

r‘^ — 2mr + a^ \d(p J r — 2m ~ 

For each possible choice of the constant of motion this equation deter- 
mines the orbits of the corresponding light rays. In the following we are only 
interested in light rays that come in from r = oo, reach a minimum radial 
coordinate r = jR, and go out to r = oo afterwards, i.e., we exclude all light 
rays that are captured by the central body. Then dr /d<p must have a zero at 
r = R and (8.53) allows to express the constant of motion p^ in terms of R 
in the following way. 




\u)o R — 2m) 

R [R? -2mR + a?‘) / R u}p{R)‘^\ 

R — 2m \i? — 2m ) ' 

With the help of this equation, (8.53) takes the form 



^ >/r(r — 2 m) dr 
— 2mr -f dip 




2ma 

r—2m 



h{r)^ 

is ± KR^y 



where we have introduced the abbreviation 



(8.54) 



(8.55) 



h{r) = 



fp(p2 _ 2mr 4- a^) 

r — 2m 



r — 2m 



^pjry 



(8.56) 



Solving (8.55) for dip and integrating over the whole ray results in 




8.3 Light propagation in a plasma on Kerr spacetime 



197 



A^p — 



±2 




^/r{r — 2m) 
— 2 m r + 



( 



u 



2ma 
1 — 2m 



h{r)‘^ 



(8.57) 




dr , 



/ 



where the upper sign is valid for corotating rays (0 > 0) and the lower sign 
is valid for counterrotating rays {(p < 0). The difference between Aip and 
± 7 T gives the total deflection angle of the ray, see Figure 8.5. If the function 
cijp{r) has been specified, this deflection angle can be calculated to arbitrary 
accuracy from (8.57), e.g., by numerical integration. The result depends, of 
course, on the frequency cJq which is hidden in the function h{r). 




Fig. 8.5. The deviation of Ap from ±7 t gives the light deflection. 



In the Schwarzschild case a = 0 the formula for the deflection angle sim- 
plifies to 



f 



dr 






h{r)^ 



(8.58) 



where the function h{r) is now given by 



h{r) = r 





(8.59) 



This formula can be used, e.g., for calculating the deflection of light rays in 
the Solar corona. Phenomenological formulae for the electron density n(r) 
and, thus, for the plasma frequency 



Up{r) = 




(8.60) 
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in the Solar corona can be found in the literature, see, e.g. Zheleznyakov [151]. 
Actually, the electron density in the Solar corona shows a considerable tem- 
poral variation, roughly snychronized with the Solar activity cycle of about 
11 years. As an average, one often uses the socalled Baumhach- Allen formula 



O 

n 




1.55-% -I- 2.99 




10 ® 



cm 



3 ’ 



(8.61) 



where Tq denotes the radius of the Sun. With u)p{r) specified by such a phe- 
nomenological formula, the integral in (8.58) can be calculated numerically. 

In the case (jJp(r) = 0 or, equivalently, for ujo — > oo, (8.58) gives the 
deflection of vacuum light rays in the Schwarzschild metric. 



\A(p\ = 2 [ 
Jr 



B?dr 



Ir ^R{R — 2m) 2m r 

If we linearize this elliptic integral with respect to m/R, we find 

\Aif\ = 



/. 



Rdr 



R — R“^ 



+ 



2m 



/. 



(8.62) 



(8.63) 



R (r® — R^) dr I' m^ 

R r \/r^ — R^^ V R^ ) 



The two integrals on the right-hand side can be calculated in an elementary 
fashion with the substitution u = R/r. This results in the standard textbook 
formula 



14.1 =. + 1 ^ + 0 ( 5 ) ( 8 . 64 ) 

for vacuum light rays in the Schwarzschild metric, cf., e.g., Wald [146], 
eq. (6.3.43), or Straumann [136], eq. (3.4.6). 

The deflection given by formula (8.64) can be modeled with the help of a 
logarithmically shaped lens with an index of refraction n > 1, see Figure 8.6. 
For a rotationally symmetric lens with a profile given by the equation 

X + klny = const. , (8.65) 

Snell’s law implies that rays parallell to the axis are deflected by 5 = ^(n— 1) 
up to terms quadratic in Comparison with (8.64) shows that, to within lin- 
ear approximation, this value coincides with the deflection in a Schwarzschild 
spacetime where |(n — 1) corresponds to the mass m. (Here it goes without 
saying that one has to identify the deflection angle d produced at the surface 
of the lense with the total deflection angle 5 = \A<^\ — tt in the Schwarz- 
schild metric.) Thus, a lense with the appropriate logarithmic shape can be 
used to approximately visualizing light deflection by a spherically symmetric 
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Pig. 8.6. To within certain approximations, the light deflection in a Schwarschild 
spacetime can be mimicked with the help of a logarithmically shaped lens. 

gravitating body. Such plastic lenses have been actually manufactured and 
are often used in didactic demonstrations. For practical instructions and ad- 
ditional theoretical information we refer, e.g., to Higbie [64] and to Nandor 
and Helliwell [99]. 

Similarly to a lens in ordinary optics, a gravitational field can lead to 
multiple imaging or to the effect that a pointlike light source is seen as an 
extended object, e.g. as an arc or as a ring. In situations like that we speak 
of “gravitational lensing” . This will be the topic of the next section. 



8.4 Gravitational lensing 

In the last section we have rediscovered the relevant formulae for light rays 
being curved by the gravitational field of a massive body. For a light ray not 
directly influenced by matter, passing a spherically symmetric body of mass 
m at a minimal radial distance R, the deflection angle is given by formula 
(8.62) or, to within linear approximation with respect to m/R, by formu- 
la (8.64). For a light ray grazing the surface of the Sun, m = 1.5 km and 
R = 696 000 km, this gives a deflection angle 

5 = \A^\-Tt^l.75''. (8.66) 

The simple assumption of light particles having a non- vanishing mass, leaving 
Newtonian physics unchanged otherwise, would lead to only half that value. 
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as was found by Johann von Soldner already in 1801, see Lenard [79]. It was 
the greatest triumph in the history of general relativity when the relativistic 
value (8.66) of the deflection angle was confirmed, to within tolerable error 
bounds, by observations during a total Sun eclipse in the year 1919. Historical 
details on the 1919 expedition, organized by the Royal Astronomical Society 
of London and headed by Arthur Eddington, can be found, e.g., in Pais [103], 
p. 303. In later years the development of radio telescopes made it possible to 
measure the relativistic deflection of rays at any time, not just during a total 
Sun eclipse, and with strongly increasing accuracy. Recent measurements, 
using very-long-baseline interferometry, have confirmed the relativistic value 
to within 0.02%, see Lebach et al. [78]. Here the influence of the Solar corona 
on the deflection of radio rays has to be taken into account. As a matter 
of fact, nowadays measurements of this kind are performed chiefly with the 
intention to gain information about the Solar corona. 




(a) rotationally symmetric situation 







(b) non-symmetric situation 



Fig. 8.7. In a rotationaUy symmetric situation, gravitational lensing can lead to 
the effect that a pointlike light source is seen as a ring around the deflector. In a 
non-symmetric situation, there might be a number of discrete images. 



For an observer on the Earth, the deflection of starlight by the gravi- 
tational field of the Sun causes only a tiny distortion of the configurations. 
However, much more drastic effects are possible if (i) the mass-to-radius ratio 
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of the deflecting body is bigger than that of the Sun, and/or (ii) the distance 
between observer and deflecting body is bigger than the distance between 
Earth and Sun. Then it is even possible that the observer sees more than 
one image of a light source at his or her celestial sphere. In a rotationally 
symmetric situation, the observer would see a pointlike light source as a ring 
around the deflector, in a less symmetric situation there might be a number 
of isolated images, see Figure 8.7. It has become common to speak of gravita- 
tional lensing in situations like that. This term was indirectly introduced by 
Lodge [85] who was the first to discuss the question of whether the effect of 
the gravitational field of the Sun upon light rays is similar to that of a lens. It 
should be mentioned that Lodge’s discussion cannot be viewed as genuinely 
general-relativistic since it is based on an ether theory. (Incidentally, it is well 
known that Sir Oliver Lodge always maintained a skeptical if not rejecting 
attitude towards general relativity.) Therefore, it is better justified to credit 
Eddington [34] [35] and Chwolson [28] who independently pointed out the in- 
principle possibility of gravitational lensing on the basis of general relativity. 
In particular, Chwolson [28] was the first to mention the ring phenomenon 
depicted in Figure 8.7 (a). At that time the practical observability of gravita- 
tional lensing was a completely open question. In his only publication on this 
subject, Einstein [39] gave a deeply pessimistic view. (From a scribbled calcu- 
lation in Einstein’s private notebook, discovered only in the 1990s, we know 
that he had thought about multiple imaging by gravitational fields already 
in 1912, when the final formalism of general relativity was still to be found.) 
Zwicky [152] was the first to consider gravitational lensing by extragalactic 
objects, but his subsequent observations remained without success. It was 
not before 1979 that the first promising candidate for gravitational lensing 
was found. In that year Walsh, Carlswell, and Weyman [147] suggested that 
the double quasar 0957+561 is, actually, only one quasar which is gravitar 
tionally lensed by an intervening galaxy. By now, this explanation is accepted 
by a large majority of astrophysicists, and many other promising candidates 
for gravitational lensing have been found, including multiple quasars, radio 
rings, and luminous arcs. For detailed reviews we refer to Schneider, Ehlers, 
and Falco [128] and to Refsdal and Surdej [122]. In addition, the reader may 
consult a regularly updated electronic review by Wambsganss [145] and a 
forthcoming book on mathematical aspects of gravitational lensing by Fet- 
ters, Levine and Wambsganss [115]. 

Purely spatial pictures, such as Figure 8.7, are appropriate to illustrate 
gravitational lensing in stationary situations only. In time-dependent situ- 
ations (e.g., if the deflector is moving non-stationarily) it is inevitable to 
switch to a spacetime description. If, in addition, the effect of media on the 
light rays is to be taken into account, we are led to studying gravitational 
lensing in terms of ray-optical structures on Lorentzian manifolds, i.e., on 
general-relativistic spacetimes. In the following we discuss, within such a 
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differential-geometrical setting, the relevance of Fermat’s principle for gravi- 
tational lensing. Later we specify to the stationary case. 

To that end we consider the following situation. In a 4-dimensional 
Lorentzian manifold to be interpreted as a general-relativistic space- 

time, we fix a point q e M and a time-like C°° embedding 7: I — ^ M 
from a real interval I into Ai. We interpret ^ a.s an event where an observa- 
tion takes place, and we interpret 7 as the worldline of a light source. The 
parametrization of 7 could be proper time, 5(7,7) = -1, but any other 
smooth parametrization would do as well. We interpret the parametrization 
of 7 as past-pointing^ as indicated by the arrow in Figure 8.8. 




Fig. 8.8. In a gravitational lensing situation there are several light rays from a 
light source 7 to an observer q. 



We fix a ray-optical structure J\f on M, thereby specifying the properties 
of the optical medium in which light propagation is to be considered. To 
avoid pathologies we assume that Af is causal in the sense of Definition 6.1.1. 
Then each light ray, emitted from the light source 7 into the future and 
received by the observer at q, corresponds to a ray A : [0, 1] — > M of the 
ray-optical structure Af with A(0) = q and A(l) = where T(A) 

denotes some parameter value, with the non-space-like vector A(l) pointing 
into the same half of the null cone bundle as the time-like vector 7(T(A)), 
i.e., 5(7(T(A), A(l)) < 0. If there is more than one such ray, then we are 
in a gravitational lensing situation, see Figure 8.8. (Here it goes without 
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saying that two rays are identified if one is a reparametrization of the other.) 
There might be a finite or infinite number of denumerable rays, or a whole 
continuum, e.g., a one-parameter family. In the latter case the observer might 
see an extended image, such as an arc or a ring, of the pointlike source 7 . 

(a) The (non-stationary) vacuum case 

In the case of vacuum light propagation, M = AT®, the rays are the light-like 
geodesics of the spacetime metric g. We can then use Fermat’s principle in 
the version of Theorem 7.3.2 to characterize the rays between q and 7 . For the 
trial curves we have to consider all virtual rays, i.e., all light-like C°° curves 
A: [0,1] — ^ M with A( 0 ) = q, A(l) = and < 0. 

By Theorem 7.3.2, such a trial curve is a ray if and only if it makes the arrival 
time functional T stationary; here the arrival time functional T is defined by 
the equation A(l) = 7 (T(A)). If there are at least two stationary points Ai 
and A 2 of the arrival time functional, with A 2 not just a reparametrization of 
Ai, then we are in a gravitational lensing situation. 

This version of Fermat’s principle has the advantage that it applies to 
time-dependent gravitational fields. E.g., it can be used to calculate the in- 
fluence of a gravitational wave sweeping over a gravitational lensing situation. 
Calculations of this kind have been carried through by Kovner [74] and by 
Faraoni [42]. 

If there is a continuous one-parameter family of light rays connecting q 
and 7 , then along any ray of this family the end-point must be conjugate 
to the initial point in the sense of Definition 5.6.3. For a proof it suffices to 
observe that a finite portion of the timlike curve 7 cannot be contained in the 
vacuum light cone which is made up by the light-like geodesics issuing from 
q. In this sense, in a vacuum gravitational lensing situation all parts of an 
extended image, such as a ring or an arc, show the light source at the same 
age. This is not necessarily true in a medium, 

(b) The (non-stationary) matter case 

For light propagation in matter, J\f ^ we have to use Fermat’s principle in 
the more general version of Theorem 7.3.1. If we want to allow for dispersive 
media, we have to choose a generalized observer field W in the sense of 
Definition 7.3.1 and we have to choose a frequency constant uJo € M. For 
the trial curves we have to consider all curves ^ G M{J\f,q,j,W,uJo), in 
the sense of Definition 7.3.2, further restricted by the additional assumption 
g{j{T{X)),X{l)) < 0. By Theorem 7.3.1, such a trial curve a lifted 
ray if and only if it makes the generalized optical path length functional .F 
stationary, provided that the regularity condition (7.12) is satisfied along ^ 
for one and, thus, for any Hamiltonian H of M. In comparison to the vacuum 
case, two observations are to be emphasized. First, it is necessary to consider 
trial curves in T*M rather than in M. Second, the variational principle will 
give us only the light rays for a specific value of the frequency constant oJq- 
Please note that Uo fixes the frequency with which the respective light ray 
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is emitted by 7 and that uJq is given in usual physical units only if 7 is 
parametrized by proper time. 

This variational principle can be applied to gravitational lensing in time- 
dependent gravitational fields and in time-dependent media. As an example, 
we consider a non-magnetized plasma, i.e., a ray-optical structure M given 
by a Hamiltonian of the form (8.42) on an arbitrary Lorentzian spacetime 
manifold (M,g) with an arbitrary spacetime function Wp. The trial curves 
C € dJt{Af,q,'y,W,u)o) are characterized, in terms of their representations 
(x(s),p(s)) in a natural chart, by the equations (5.10), (5.11), and (7.10), 
i.e.. 



9 (^(^)) Pa(s)pb(s) = -u;p(x(s)) , 



(8.67) 



x“(s) = k(s)g^^(x(s))pb(s) , 



( 8 . 68 ) 



W^(s)pa(s) = 

■k(s)W‘‘(s) 



(8.69) 



supplemented with the boundary conditions that x“(0) are the fixed coordi- 
nates of q, x^(l) are coordinates of a point on 7, and W“(l)po(l) = 

In addition, we have to restrict to curves with gab^^(^)^^(i) < 0. If Up 
has no zeros, (8.67) and (8.68) imply that the projected curves A = o ^ 
are time-like for every ^ € dK{M,q,j,W,uJo). Moreover, it is easy to cWk 
that the relation between ^ and A is one-to-one. The projected trial curves 
A are characterized, in terms of their coordinate representations x(s), by the 
differential equation 



f ^p{x)gab{x)x^ 
\V-9fh{x) xf X^J 



(8.70) 



( 1 dg^^{x) u)p{x)goe{x)x^gdb{x)x^ , / „ 

V2 dx- ^-gfj^fx^ V 9fh{x)x X j , 



supplemented with the boundary conditions that a;“(0) are the coordinates 
of q, a;(l) are coordinates of a point on 7, and 



(8.70) and (8.71) fix the pseudo-Euclidean angle between the (projected) trial 
curve and W. The generalized optical path length, which was introduced in 
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Definition 7.3.3 as a functional ^ i — >■ T{C), reduces to a functional on the 
projected curves, A i — ^ F{X), given by 

JT(A) = - /' y-a(A(s),A(s)) ds + T(A) . (8.72) 

Thus, the light rays emitted on 7 with the frequency ujo are the extremals of 
the functional (8.72) among all curves A between q and 7 whose coordinate 
representations s \ — > a:(s) satisfy (8.70) and (8.71). Please note that F re- 
duces to the arrival time functional T in the limit uip — » 0, but that (8.70) 
and (8.71) cannot be used in this limit since they contain undetermined ex- 
pressions of the form 0/0. For this reason, a somewhat inconvenient matching 
procedure must be used if regions with Up = 0 and regions with Up ^ 0 are 
to be treated in a unified setting. 

(c) The stationary case 

Now we want to consider the situation that AT is a stationary ray-optical 
structure and that 7 is an integral curve of the distinguished time-like vector 
field W e G//, i-e., that the light source is at rest with respect to this time-like 
vector field. Moreover, we shall assume that the assumptions of the reduc- 
tion theorem (i.e., of Theorem 6.5.1) are satisfied. The gravitational lensing 
situation can then be described in terms of space rather than in terms of 
spacetime, viz., in terms of the reduced ray-optical structure. If the reduced 
ray-optical structure is strongly regular (which is true in virtually all situar 
tions of physical interest in which the preceding assumptions are valid), the 
Morse theory developed in Sect. 7.5 can be applied. 

We want to illustrate the general features of this approach by way of 
example. To that end we consider, on a 4-dimensional Lorentzian spacetime 
manifold {M,g), a ray-optical structure A7 determined by a Hamiltonian of 
the form (8.42), i.e., a dispersion relation of the form 

5“^(x) Pa Pb + o;p(x)^ = 0 (8.73) 

which describes light propagation in a non-magnetized plasma. Here are 
the contravariant components of the spacetime metric and ujp is the plasma 
frequency. The spacetime metric is supposed to describe a cosmological model 
with some local mass concentrations that act as ’’deflectors”; the light rays 
are supposed to be influenced by some plasma clouds, situated in regions 
where the function ujp is different from zero. 

We want to assume that M is stationary, i.e., that there is a time-like 
vector field W in the symmetry algebra Gm- This means that W must be a 
conformal Killing field of the spacetime metric p, 

Lw{e-^'g)=0 (8.74) 

where / = Jln( - g{W,W)), and that the rescaled plasma density must be 
constant along each integral curve of W , 
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Lw{e^u)p)=0. (8.75) 

These assumptions are satisfied, e.g., if there is an open subset V in M, 
invariant under the flow of W, with the following properties. (M,g) is a 
Robertson-Walker spacetime without plasma {cjp = 0) on M\V, whereas it 
is a stationary spacetime with a stationary plasma {Lwg = 0 and = 0) 

on T>. V is to be interpreted as the region where the influence of the deflector 
mass and of the plasma cloud on the light rays is to be taken into account. 
Instead of a Robertson-Walker spacetime we could use any other conformally 
stationary cosmological background on M\V. 

To apply the reduction theorem, we have to assume that there is a global 
timing function t : M — > M for W which gives us a global diffeomorphism 
(tt, t) : Ad — > Ad X R, please recall Figure 6.3. To construct the reduced ray- 
optical structure according to Theorem 6.5.1, we choose a frequency constant 
(jJo > 0. Prom (8.73) we read that rays with PaW^' = —Ug cannot leave the 
region 

Muj, = {qeM\ e~‘^f^^'^ujp{qf < } . (8.76) 

If we restrict to this region, all assumptions of Theorem 6.5.1 are satisfied 
and the reduction can be carried through, giving us a reduced ray-optical 
structure A/^^ on the 3-dimensional space Ada>„ = Ad^^/^, In the vacuum 
case tUp = 0 we have, of course, Ado,„ = Ad for all ujo > 0, otherwise it might 
be necessary to excise some parts from spacetime where the plasma frequency 
is so large that rays with frequency constant ujg cannot enter. However, if the 
function ojp has spatially compact support we always have = Ad for 
sufficiently large ujo- 

With the results from Sect^ 6.6 it is easy to find a Hamiltonian for the 
reduced ray-optical structure - First we recall that, by (8.74), the space- 
time metric induces a positive definite metric g and a one-form 0 on Ad, 
according to (6.98). The one-form 0 vanishes if and only if W is orthogonal 
to the hypersurfaces t = const. In coordinates with = t and d/dx^ = W 
the spacetime metric takes the form (6.101). Moreover, (8.75) implies that 
there is a function a>p : M — > M such that 

e~^u)p = 7T*u)p . ( 8 . 77 ) 

Hence, in coordinates with x^ = t and dfdx'^ — W the dispersion relation 
(8.73) is equivalent to 

9^"" (P/x - 7»40 /x) {p<7 - P4^<t) - + Wp = 0 (8.78) 

with greek indices running from 1 to 3. According to the general rules found 
in Sect. 6.5, the left-hand side of (8.78) gives us a Hamiltonian for the reduced 
ray-optical structure if p 4 is replaced with —cjo- Since we are always free to 
multiply the Hamiltonian with a non-zero function, this implies that is 
generated by the Hamiltonian 
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+ l^o$,i){Pcr + UJo}a) 

V 




(8.79) 



To study gravitational leasing we fix two points q and q' in Mwo we 
ask how many rays of go from q to q'. It is easy to check that for the 
Hamiltonian (8.79) the map x R — > is a global diffeomor- 

phism onto its image, i.e., that is strongly hyperregular according to 

Definition 5.2.2. Thus, the Morse theory developed in Sect. 7.5 applies. For 
the Hamiltonian (8.79), the space of trial curves QJ(H, g,g') is equal to the 
set of all curves, defined on the interval [0, 1], in from q to q' with 



(ujI - Wp) — const. 



and the action functional is given by 









(8.80) 



(8.81) 



for each A € ^q,q' with coordinate representation x E iJ^([0, 1],R^). Please 
note that, up to the factor ujo > 0> the action functional (8.81) equals the 
optical path length (6.84) of the lifted ray ^ associated with the ray A. In the 
vacuum case (Up = 0, the optical path length can be reinterpreted as a travel 
time according to Proposition 6.5.3. 

According to Fermat’s principle in the version of Theorem 7.5.1, the light 
rays from q to q' are the stationary points of the action functional (8.81) or, 
equivalently, of the optical path length functional. In the static (i.e., non- 
rotating) case we can choose the timing function in such a way that 0 = 0. 
Then the optical path length functional is equal to the length functional of 
the frequency-dependent metric 

So=(l-§)s. (8.82) 

Please note that in the rotating case the optical path length functional is 
not invariant under orientation-reversing reparameterizations. Hence, in that 
case a light ray from q to q' does not travel along the same path as a light 
ray from q' to q. 

Since, for the Hamiltonian (8.79), the matrix 




is positive definite on TV , the Morse index theorem in the version of Theo- 
rem 7.5.4 implies that along each ray the extended Morse index is equal to the 
number of conjugate points counted with multiplicity, see (7.56). In particu- 
lar, a ray gives a strict local minimum of the optical path length functional 
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if and only if it is free of conjugate points whereas it gives a saddle-point 
if there is a conjugate point in the interior. Since at each conjugate point 
neighboring light rays are crossing from one side to the other, an odd Morse 
index is associated with a side-reversed image in comparison to an even Morse 
index. This is observable for light sources surrounded by irregular structures, 
e.g., for quasars with jets or lobes. 

For the vacuum rays it is known that the occurence of conjugate points 
gives rise to multiple imaging situation. Under certain assumptions on the 
causal and topological structure of spacetime, the converse is also true, i.e., 
in any multiple imaging situation at least one of the rays must contain a 
pair of conjugate points. For a general proof of these facts we refer to Perlick 
[111]. This is an interesting result since, in combination with Einstein’s field 
equation, the existence of conjugate points along a vacuum light ray allows 
to estimate the matter density along the ray, see Padmanabhan and Sub- 
ramanian [102]. The above-mentioned Morse index theorem might be useful 
for generalizing this result to the case of light rays in media, at least for 
stationary situations and for media which satisfy the positive-definiteness 
assumption of Theorem 7.5.4. 

Finally we want to prove an odd number theorem^ i.e., we want to show 
that, under certain reasonable assumptions, a transparent deflector always 
produces an odd number of images. To that end we generalize a differential- 
topological argument, first published by McKenzie [95], into our setting of 
stationary ray-optical structures. For the sake of comparison the reader is ref- 
ered to Dyer and Roeder [33] who prove an odd number theorem for spherical 
deflectors, and to Burke [24] and Fetters [113] where odd number theorems 
are given for thin deflectors and weak gravitational fields. An argument very 
similar to Burke’s but under slightly more general assumptions was worked 
out by Lombardi [86]. A general discussion of odd number theorems can also 
be found in Schneider, Ehlers, and Falco [128]. 

The following argument applies to all situations in which the assumptions 
of the reduction theorem (i.e., of Theorem 6.5.1) are satisfied. As before, we 
fix two points q and q' in we ask how many rays of Af^Jo from q 

to q'. We need the following three additional assumptions (see Figure 8.9). 

(a) There is an open subset B in with the following properties, q E B 

and B is contractible to q, i.e., there is differentiable map ^ : [0, 1] x B — >• 
B with <P(0,r) = f and ^(l,r) = q for all f E B. The closure of B 
is compact in The boundary «S = dB of B is diffeomorphic to a 

2-sphere and q' ^ 

(b) Every ray of jv issuing from q intersects S if sufficiently extended. 

(c) Every vector in A4u,„ is the tangent vector of a ray of /v , and this 
ray is unique up to extension and reparametrization. 

In physical terms, conditions (a) and (b) prohibit non-transparent deflectors. 
Such a non-transparent deflector would to be modeled either as a hole in , 
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Fig. 8.9. Under the assumptions stated in the text, the rays issuing from g define 
a continuous map from the small sphere So to the big sphere S. The degree of this 
map must be equal to 1 which proves that there is an odd number of rays from q 
to q'. 



thereby violating condition (a), or as a compact region in which some rays are 
trapped, thereby violating (b). Condition (c), roughly speaking, makes sure 
that in any spatial direction there is exactly one ray of /v . This condition 
is satisfied, e.g., if Af is the vacuum ray-optical structure. Please note that 
condition (c) could not hold if was not strongly regular. In the dispersive 
case, conditions (a), (b), and (c) have, of course, to be checked for each value 
of the frequency constant uJo individually. 

Under these assumptions, every ray issuing from q intersects an infinites- 
imally small sphere So around q in exactly one point r, and it reaches the 
sphere S at some point f{r). This defines a differentiable map /: So — > S. 
We now fix a regular value of /, i.e., we fix a point d E S such that for all 
f E So with f{r) = 0 the tangent map Tf^f : T^So — > Tf{^)So is a bijection. 
Please note that, according to the well known Sard Theorem (see, e.g., Abra- 
ham and Robbin [2], p. 37) almost all points in S are regular values of /. 
Clearly, d is a regular value of / if and only if d is not conjugate to q along 
any ray in B. With a regular value d chosen we define the degree of / as 

deg(/) = E sgn(f) (8.84) 

f{f)=6 



where sgn(r) is equal to +1 if the differential T^f is orientation preserving 
and equal to —1 otherwise. Here we refer, of course, to the orientations of 
the spheres according to which q “lies to their inner sides” . It is a standard 
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theorem in differential topology that deg(/) is well-defined, i.e., independent 
of the choice of d, see, e.g,, Guillemin and Pollack [54] for a detailed discussion. 
Moreover, our assumption of B being contractible to q gives an orientation 
preserving diffeomorphism from S to So and a smooth deformation of / into 
the identity, i.e., it implies that / is homotopic to the identity map. As it 
is well known that, for maps between compact manifolds without boundary, 
the degree is a homotopic invariant, the degree of / must be the degree of 
the identity, i.e., deg(/) = 1. 

We now consider the rays from q to q'. We exclude the exceptional case 
that q' is conjugate to q along some ray, i.e, we assume that q' is a regular 
value of /. Then the definition of the degree implies that 

deg(/) = n+ - n_ (8.85) 

where n± is the number of rays from q to q' in B such that sgn(f) = ±1. 
Here f denotes the intersection of the ray with Sq. Clearly, is the number 
of rays with an even number of conjugate points and n_ is the number of 
rays with an odd number of conjugate points. As the degree of / is equal to 
1, (8.85) implies that n+ + n_ = 1 + 2n_, i.e., the number of rays from q to 
q' is odd. 

For this argument stationarity was, of course, essential since otherwise 
there is no space in which it could be applied. Even for vacuum rays it is 
hard to see how a similar degree argument could give an odd number theorem 
in a spacetime setting, i.e., without assuming stationarity. (This problem 
was discussed in detail by Gottlieb [51].) For that reason it is important 
to know that McKenzie [95] was able to give another argument to prove 
that a transparent deflector produces an odd number of images. This was 
done for vacuum light rays in a globally hyperbolic spacetime, using Morse 
theoretical results of Uhlenbeck [144]. Unfortunately, it was necessary for 
McKenzie to impose some additional assumptions on the spacetime metric the 
physical meaning of which is obscure. Therefore it seems fair to say that in the 
non-stationary case a satisfactory odd number theorem is still missing, even 
for vacuum rays. Infinite dimensional Morse theory, as it was developed for 
vacuum rays between a point and a time-like curve in a Lorentzian manifold 
partially by Perlick [110] and, to a fuller extent, by Giannoni, Masiello, and 
Piccione [47] [48], could be a useful tool. In the non-stationary non- vacuum 
case, there are not even rudiments of a Morse theory for light rays between 
a point and a time-like curve. So there is still a lot to be done in the future. 
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